If an expression selects no atoms or residues, then there is generally no error; that command simply does not do anything. The exception to this is the position vector specification.
Note that atom selections and residue selections
cannot be freely used in the 'and' and 'or' expressions. The selection
expressions are strongly typed; all terms in one 'and' or 'or'
expression must be of the same type; either atom or residue. However,
there are operators that convert an atom selection into the
corresponding residues
(contains
) and vice versa
(in
).
exp1,
exp2, exp3,..., expn
must all be true for an
atom or residue to be selected. All the expressions must of one type;
either atom or residue selection.
Note the comma ',' character: it is required
between the expressions, except
before the keyword and
, where it may not occur.
exp1, exp2, exp3,..., expn
is
true for an atom or residue, then that atom or residue is
selected. All the expressions must of one type; either atom or residue
selection.
Note the comma ',' character: it is required
between the expressions, except
before the keyword or
, where it may not occur.
exp
for each atom
or residue into its opposite value.
require in residue first_argument and atom second_argumentNote that although the res-atom expression is most often used to select one single atom, it will select all atoms that fit the arguments.
either require in amino-acids and either atom N, atom CA, atom C or atom O or require not in amino-acids and either atom *', atom O%P or atom PThat is, if a residue is an amino acid, then its N, CA, C and O atoms are selected. If it is not an amino acid, then the atoms with names appropriate for the nucleic acid residue phosphate and (deoxy)ribose groups are selected. In the latter case an expression that selects all primed atoms is used.
require in amino-acids and either atom N, atom C or atom OFor all amino acid residues, the peptide atoms (N, C and O) are selected.
either atom H*, atom 1H*, atom 2H* or atom 3H*That is, all atoms having the names commonly given to hydrogen atoms in a PDB file are selected.
Note that this selection is currently not based on the element specified for the atom in the new (v2.0) PDB file format. It may in a future version.
The element type of the atoms are set when the coordinate file is read.
In the new (v2.0) PDB coordinate file format, the different coordinate sets from an NMR structure determination are given sequential model numbers, starting with 1. This is determined by the MODEL keyword in the PDB coordinate file.
Molecules read from a coordinate file with no MODEL keywords (e.g. an X-ray diffraction structure) will have the model number 0.
If there is more than one stretch of residues that match,
then all such stretches are selected. For example, if a coordinate
file contains amino acids from 1 to 100, and waters also numbered 1 to
57 (as may occur in PDB files), then a sequence specification
"from 5 to 15"
will pick both the stretch of
amino-acid residues from 5 to 15, and the waters from 5 to 15.
This is usually not a problem in connection with commands such as helix or coil, since any selected non-amino acid residues are simply ignored by these. The behaviour can be advantageous when dealing with symmetrical subunits. The name comparison feature can then be used to pick both strands (or whatever) in both chains with one single command.
As a special case, if the first residue in the coordinates that match the 'from' part is an amino-acid residue, then all other first residues (if any) must also be amino-acid residues. This solves a problem that occurs in some PDB files where some amino-acid residues and ligands (hetero groups) have the same name, and the ligands are interspersed between several chains of amino-acid residues.
If a stretch of residues is not finished when the last residue in the currently loaded coordinates is reached, then MolScript issues a warning, but does not produce an error. An error should arguably be the proper response, but there are PDB files where the residue names are such that it is difficult to avoid this.
Note that the residue name is left-shifted and the blanks have been squeezed out when the coordinate file was read. This means that the chain identifier and insertion code, if any, are part of the residue name, even if they were separate in the input coordinate file.
either type ALA, type SER, type THR, type GLY, type PRO, type CPR, type ASN, type GLN, type ASP, type GLU, type ASX, type GLX, type ARG, type LYS, type HIS, type PHE, type TYR, type TRP, type TRY, type VAL, type ILE, type LEU, type MET, type CYS, type CSH, type CYH or type CSMAll standard three-letter codes for amino acid residues are recognized, as well as some non-standard ones; CPR for cis-proline, ASX for undetermined ASN or ASP, GLX for undetermined GLN or GLU, TRY for tryptophan, and CSH, CYH and CSM for cysteine.
either type H2O, type HHO, type OHH, type HOH, type OH2, type SOL or type WATAt least some of the commonly occurring residue type designations for water molecules are covered by this expression.
either residue A, residue +A, residue C, residue +C, residue I, residue +I, residue G, residue +G, residue T, residue +T, residue U or residue +UThis covers the common nucleotide bases as well as modified variants of these bases designated according to the PDB conventions.
not either amino-acids, waters or nucleotidesAll residues which are neither amino acids, waters nor nucleotides are selected by this expression.
This selection is useful only for molecules read from new (v2.0) PDB format files.
Comparisons between the given atom names, residue types and names, and molecule names in the various selection expressions with those present in the coordinate data follow certain rules:
Tyr
is not equal to
TYR
.
off
, then MolScript allows using X-PLOR
(Brünger 1992) type
wildcard characters in the given strings. If the value is
on
, then the given string is viewed as a proper regular
expression.
atom * all atoms atom N* all nitrogen atoms (and sodium, neon, niobium,...) atom %G* all gamma (G) atoms; CG, OG, OG1, SG (and possibly others) type T* residue types THR, TRP and TYR (and possibly others) type T%R residue types THR and TYRIf the coordinate file contains '*' in atom names (nucleic acids in PDB files) then these are converted into single-quotes ''' while reading the file. If your coordinate file contains '*' in residue names or types, or '%', '#' or '+' characters anywhere, then you must use a proper regular expression.
regexp
(except not having the
"r{m,n}"
feature):
^ beginning of line $ end of line . any character \< beginning of word \> end of word [str] any character in str [^str] any character not in str [x-y] any character between x and y (ASCII order) * any number of the preceding expression c the character c, where c is not special \(r\) the regular expression rCaveat: The above description may contain errors, since the source code used for this feature was not very well documented. Also, it hasn't been tested properly.