Modeller

Installation

Download the software from the Modeller site.

Follow the installation instruction, here.

Instructions (simple homology modeling)

Basic modelling (here)

Search the target sequence

Put the target sequence in PIR format.

>P1;TvLDH
sequence:TvLDH:::::::0.00: 0.00
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA
AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN
FSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKI
GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV
EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*

The first line contains the sequence code, in the format ">P1;code"
The second line with ten fields separated by colons generally contains information about the structure file, if applicable. Only two of these fields are used for sequences, "sequence" (indicating that the file contains a sequence without known structure) and "TvLDH" (the model file name).
The rest of the file contains the sequence of TvLDH, with "*" marking its end.
The standard one-letter amino acid codes are used. (Note that they must be upper case; some lower case letters are used for non-standard residues. See the file modlib/restyp.lib in the Modeller distribution for more information.

Run Modeller

It is suggested to run Modeller scripts using a Python installation, otherwise the scripts can be launched using mod9.24 where 9.24 is the version installed on the computer but the version can be change, thus it is mandatory to check the version.

To run modeller, the cmd windows must be launched from the Modeller App in order to set the environment.

In the search windows type Modeller and then click on Modeller App.

In the open command windows change the working directory using:

cd c:\WORKING DIRECTORY

The working directory can be found with the File Explorer, it can be read below the command section.

Search for potential related sequences of known structure

The search can be performed by the profile.build() command of MODELLER

from modeller import *

log.verbose()
env = environ()

#-- Prepare the input files

#-- Read in the sequence database
sdb = sequence_db(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
         chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)

#-- Write the sequence database in binary form
sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
          chains_list='ALL')

#-- Now, read in the binary database
sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
         chains_list='ALL')

#-- Read in the target sequence/alignment
aln = alignment(env)
aln.append(file='TvLDH.ali', alignment_format='PIR', align_codes='ALL')

#-- Convert the input sequence/alignment into
#   profile format
prf = aln.to_profile()

#-- Scan sequence database to pick up homologous sequences
prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',
          gap_penalties_1d=(-500, -50), n_prof_iterations=1,
          check_profile=False, max_aln_evalue=0.01)

#-- Write out the profile in text format
prf.write(file='build_profile.prf', profile_format='TEXT')

#-- Convert the profile back to alignment format
aln = prf.to_alignment()

#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')

The profile.build() command has many options

In this example rr_file is set to use the BLOSUM62 similarity matrix (file "blosum62.sim.mat" provided in the MODELLER distribution)
Accordingly, the parameters matrix_offset and gap_penalties_1d are set to the appropriate values for the BLOSUM62 matrix
For this example, we will run only one search iteration by setting the parameter n_prof_iterations equal to 1

Thus, there is no need for checking the profile for deviation (check_profile set to False)

Finally, the parameter max_aln_evalue is set to 0.01, indicating that only sequences with e-values smaller than or equal to 0.01 will be included in the final profile
The sequences can be found in the file "pdb_95.pir", that must be provided in the directory where the search is launched

Run Modeller:

mod9.24 01_build_profile.py

Warning: the scripts can be launched using mod9.24 where 9.24 is the version installed on the computer but the version can be change, thus it is mandatory to check the version.

Output:

log file, with all information about the run
prf file, an extract (omitting the aligned sequences).

Most important columns in the table of the results:

The second column reports the code of the PDB sequence;
The eleventh column reports the percentage sequence identities

In general, a sequence identity value above approximately 25% indicates a potential template unless the alignment is short (i.e., less than 100 residues)

The twelfth column reports the e-value of the alignment

e-values equal to 0 means very significant similarities

Alignment of template structures

To select the most appropriate template for our query sequence over the similar structures, compare.py is used to assess the structural and sequence similarity between the possible templates.

Here there is the script when more than one pdb structure was found.

from modeller import *

env = environ()
aln = alignment(env)
for (pdb, chain) in (('1TXU', 'A'), ('4q9u', 'A'), ('4n3z', 'A'),
                     ('2ot3', 'A')):
    m = model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain))
    aln.append_model(m, atom_files=pdb, align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)

The script has to be modified according to the system that we are studying.

Change the pdb codes in according to own search, here 1TXU ('1TXU', 'A')
Check the correct chain in the PDB file, here A ('1TXU', 'A')
Check the parenthesis in the script, according to Python syntax
Put the PDB files in the working directory of the script

When there is only one PDB structure, the script has to be modified according to Python syntax.

from modeller import *

env = environ()
aln = alignment(env)
for (pdb, chain) in (('1txu', 'A'), ):
m = model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(m, atom_files=pdb, align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)

Warning: attention to the comma in for instruction becuse it is mandatory.

Finally, the script can be launched.

mod9.24 02_compare.py

At the end of the loop, all of the structures are in the alignment, but they are not ideally aligned to each other (append_model creates a simple 1:1 alignment with no gaps)
Therefore, we improve this alignment by using malign to calculate a multiple sequence alignment
The malign3d command then performs an iterative least-squares superposition of the six 3D structures, using the multiple sequence alignment as its starting point
The compare_structures command compares the structures according to the alignment constructed by malign3d

It does not make an alignment, but it calculates the RMS and DRMS deviations between atomic positions and distances, differences between the mainchain and sidechain dihedral angles, percentage sequence identities, and several other measures

Finally, the id_table command writes a file with pairwise sequence distances that can be used directly as the input to the dendrogram command (or the clustering programs in the PHYLIP package)

Dendrogram calculates a clustering tree from the input matrix of pairwise distances, which helps visualizing differences among the template candidates