Lecture I: Introduction to protein structure. Properties of amino acids.
Handout: Introductory remarks; Schedule; amino acid properties. Reading assignment: Branden-Tooze II, chapter 1 (Introduction); Creighton II, chapter 1.1 to 1.3. (Introduction). [Next lecture: Creighton II, 1.61 (amino acid analysis); 1.62 (sequencing); 1.8 (peptide synthesis)]
A. Levels of Protein Structure
"The subject of biochemistry is primarily the study of the role of enzymes in living system" (from Creighton's introduction); we can also say "the study of the role of proteins". To understand the action of proteins we need information on structure. Proteins are very large and complex molecules; to simplify our discussion of protein structure we customarily distinguish several specific levels of organization. The Danish biochemist Linderstrom-Lang in 1952 devised the basic terminology that is still useful today.
I. Primary or covalent structure. At the most elementary level, primary structure denotes a complete description of all covalent bonds. Since proteins basically (topologically) are linear, head-to-tail arrays of amino acids, primary structure is completely determined by the knowledge of the structure of the 20 naturally occurring amino acids and of their sequence within the chain. Natural covalent cross-links do occur in the form of disulfide bridges; strictly speaking such links are also primary structure features, but they are usually treated as features of tertiary structure because their proper formation requires folding and because depending on the redox potential of the medium they can be quite weak - in fact they are largely absent from cytosolic proteins. [The following to be covered in the lecture on folding: The high reducing potential of the cell interior is a consequence of of the high ratio of reduced to oxidized forms of glutathione (gamma-glutamyl-cysteyl-glycine) which acts as a sulfhydryl buffer; on average only 1% of all GSH occurs in the oxidized form (Creighton) (in RBC and most other cells this ratio exceeds 500 - unknown source). Nevertheless, the stability of an SS bond within a protein very much depends on the proximity of the Cys residues and on the folding of the polypeptide chain in which they occur. Even within the cell interior SS bond formation may be favored and the equilibrium constant could reach values of 105, indicating that less than 0.01% of a specific pair of Cys residues in a cytosolic protein is in the reduced form. It should be pointed out that few cytosolic proteins are known to contain SS bridges. These are features of secretory proteins or of the extracellular domains of membrane proteins which face high enough oxygen tensions to make the SS bond a strong stabilizer of the folded structure.]
II. Secondary Structure. This term refers to the regular local arrangements arising from hydrogen bonds between elements of the polypeptide backbone. There are only a small number of choices: helices, sheets, and turns among which the alpha helix and the beta sheet predominate; these are distinguished by specific positions on the Ramachandran plot. The term supersecondary structure refers to commonly observed motifs or aggregates of secondary structure elements; examples are the alpha-coiled coil and beta barrel)
III. Tertiary Structure. Refers to the folded (3D) structure of a protein, i.e. a complete description of the spatial coordinates of all atoms in the protein, as determined, say, by x-ray crystallography or nmr. Domains are portions of proteins that have their individual evolutionary history and that often fold and unfold independently of other parts of a polypeptide; they often contain or constitute specific sites serving specific functions.
IV. Quaternary Structure. Refers to subunit interactions in a multimeric protein in which 2 or more independently folded chains form higher-order structures by interacting with surfaces that are complementary in shape and charge. Usually to types of such structures are distinguished: isologous and heterologous.
B. Amino Acids, the Building Blocks ("Zeroth Order Structure")
1. What amino acids have in common: Stereochemistry and acid- base properties
Before we set out on a more detailed discussion
of primary structure, let us take a look at the building blocks themselves,
the amino acids. (Amino acid composition could be regarded as the zeroth order
structure of proteins.) All amino acids can be described by the general formula
(draw) [Could also indicate that P is a special case;
the amino acid proline is in fact a cyclic imino acid. This leaves only the
carboxyl function as a component that all amino acids have in common. Since
the imino nitrogen is as basic as the nitrogen in the amino group, it would
be more appropriate to speak of the 20 naturally occurring carboxyl bases].
Note that in all cases where the side chain R is not hydrogen, the central carbon
atom represents a center of asymmetry; in other words, amino acids are optically
active [= chiral, from the Greek word for hand; therefore the synonymous term
handedness for chirality - draw correct form. All amino acids in proteins are
of the l-form. [In the SR notation of organic chemistry,
it is the S-isomer.] It is not clear why this is so, since
chemical mirror images should not differ in their size, shape, charge, or chemical
reactivity; most scientist believe that this fact is an accident of evolution
and that originally the d-amino acids had had the same chance to become building
blocks of biological macromolecules. In fact it is possible
to chemically synthesize proteins made up entirely of d-amino acids. Milton
et al. (Science 256:1445, 1992) reported the synthesis of the d-enantiomer of
HIV protease which, not surprisingly, cleaves substrates which are the mirror
images of natural substrates. Others think that
decay, which is asymmetric, preferentially destroys d-amino acids
(It may interest you that the asymmetry of
decay was originally demonstrated by C. N. Yang, who headed the Institute for
Theoretical Physics here at Stony Brook). At any rate, it
is clear that there is a requirement for uniformity so that the protein synthetic
apparatus, i.e. the ribosome, can handle all building blocks in an identical
manner. The exception is G in which the side chain equals hydrogen. G
accordingly occurs only in one isomeric form. A simple way to remember the absolute
configuration is Dickerson's bridge mnemonic (draw). Note
that in the RS (Cahn-Ingold-Prelog) notation used by chemists, l-amino acids
are S-compounds, with the exception of l-Cys = R-Cys.
Even if we use the proper geometric representation, the molecule is not correctly drawn. After all, amino acids are acids or - more generally speaking - electrolytes, i.e. they contain ionizable groups. At neutral pH, a given amino acid is much more likely to exist in a doubly charged form [here correct the drawing]. This zwitterionic form predominates between pH 2 and 10. Amino acids are weak electrolytes; they are only partly ionized near neutral pH. Their dissociation behavior can be quantitatively described by the Henderson-Hasselbalch equation [derive from law of mass action whose significance for general ligand binding analysis should be stressed ].
Note that pK is defined as that pH at which 50% of the ionizable functions are protonated; it is consequently a measure of the tendency of the group to release or pick up a proton, i.e. of its strength as an acid or a base. pK-values for the common functional groups are quite comparable from one amino acid to another: about 2.2 for carboxyl groups and about 9.5 for amino functions; even the imino function of proline has a pK of around 9. Note that the pK of the a functions in peptides is shifted toward neutrality (carboxyl: ca 3.5 vs 2.2; amino: ca 8.5 vs. 9.5 - i.e. ionization is suppressed, pH range over which the function is ionized or charged is reduced).
2. Unique properties of amino acids
So far we have considered ionizable functions
that all amino acids share. Side chains impart individuality to an amino acid,
and it is to the side chains that we now turn. A list of amino acids and their
side chain structures and pK values is given in the handout. These 20 amino
acids are coded for by genetic information. Many of them
can be modified within the folded protein, e.g. by phosphorylation, without
requiring an expansion of the genetic code. However it is now known that a 21st
amino acid is coded for, namely selenocysteine, which occurs in a few enzymes,
notably glutathione peroxidase. In this case UGA, one of the normal termination
codons, is recruited for the job (Note that a list of 3-letter abbreviations
as well as of single-letter designations is also provided; I would strongly
recommend committing the single letter code to memory. Although they are not
quite as easy to memorize they are widely used). As we shall
see side chains and their unique ionizable functions dominate the acid/base
or electrolytic properties of polypeptide chains. In addition to carboxyl and
amino functions we find guanidinium, imidazol, phenolic hydroxyl groups etc.
The pK values given are approximate values; as in the case of
-amino
and
-carboxyl groups, ionization of these functions inside proteins are slightly
different. In general, chemical properties of a functional group are determined
by context, i.e. the relationship with other functional groups in the folded
protein. Here recall that all amino acids have at least
2 ionizable groups - this also holds for proline as imino and amino nitrogens
have very similar pKs!!
a. Classification according to polarity: More generally amino acids can be classified according to their hydrophobicity/ hydrophilicity or their polarity or water solubility into two groups: hydrophilic, i.e. water soluble, and hydrophobic, insoluble in water and soluble in apolar solvents (have been listed according to this classification on the handout). Hydrophilicity values can be obtained by studying the free energy of transfer from the gas phase to aqueous solution; this is done by using side chain equivalent compounds (e.g. ethanol for threonine). Hydrophobicity values are similarly obtained by studying the partition of amino acids between water and organic solvents. Depending on what approach and what solvent (octanol, dioxane, cyclohexane etc.) slightly different numerical values are obtained. However these values are in basic agreement in that amino acids can be grouped into several clusters.
(i) First let us take a look at the apolar residues which as a rule occur in the interior of folded proteins: G is small and flexible; occasionally, e.g. at helix crossings, it is absolutely necessary because all other amino acids would be too bulky. A is nondescript, a useful brick that is quite abundant in proteins (In E. coli about 10% (13% Schulz and Schirmer; 8.3% acc. to Creighton II, p. 4, rather than the 6.25% expected from the random occurrence of the alanine codons). Branched side chains of V, I, L, F are stiff bricks from which protein interiors are made; the absence of linear alkyl chains is noticeable. The combination of two crosslinked cysteins, C-S-S-C, also referred to as cystine, is very apolar, and so is M. Other special cases are the imino acid P which, as one might expect, has unique steric properties; being a frequent component of so-called reverse turns it is often found on the surface of proteins. Finally there is the aromatic amino acid W which despite its general hydrophobicity can make a hydrogen bond with its indol nitrogen.
(ii) Polar residues are conveniently grouped into nonionizable and ionizable. The former class includes W, already mentioned, and the hydroxylated side chains of S and T and the amides N and Q. They are important because they are rich in H-bonding capability. C and Y become ionized at fairly high pH (in comparison to alcohols, thiols dissociate more readily because of the lower electronegativity of sulfur; phenols because the phenolate anion is stabilized by resonance). Among the ones that are ionized at neutral pH there are 2 acidic, D and E, and 3 basic side chains, R, K, and H. D and E are weaker acids than the alpha carboxyls, because of the absence of a neighboring positive charge that facilitates release of the proton; although they differ only by a methylen group they interact quite differently with the polypeptide backbone (Explain!!). H has a pK value near neutrality and therefore readily participates in proton transfer reaction at physiological pH. It is an important acid-base catalyst that occurs at the active site of many enzymes although it is otherwise a relatively rare protein constituent.
The classification according to side chain polarity is virtually identical with a topological one. In globular water-soluble proteins amino acids may be grouped into external, internal, and ambivalent residues (handout p.2) these groups obviously differ in accessible surface area. By and large, the measurement of the free energy of transfer of an amino acid or its derivative or its partitioning between an aqueous and an organic phase yields the same results as a careful inventory of the distribution of residues in the inside and on the surface of proteins. Both types of analysis reinforce and confim each other; their results are combined in te widely used hydropathy index of Kyte and Doolittle. The Dickerson & Geis scheme (handout p.2) highlights two aspects of amino acid classification: 1. classification by polarity parallels topological classification (i.e. polar residues found exclusively on the outside or surface of globular proteins, apolar mostly on the inside), and 2. classification is not all-or-none; there are overlaps between categories (example: W is bulky and hydrophobic, but able to serve as a hydrogen bond donor).
Spectroscopic properties: Obviously the side chains in particular of the more hydrophilic amino acids also differ in their chemical reactivities, especially their ability to participate in nucleophilic and electrophilic substitutions and to undergo redox reactions. We will go over these properties later when we talk about the analysis and manipulations of proteins using chemical probes; in addition chemical reactivities will be seen to play crucial roles in enzyme mechanisms. For now I would like to limit discussion to spectroscopic properties. The aromatic side chains of F, Y, and W impart special spectroscopic properties not only to these specific amino acids themselves but to the peptides and proteins in which they occur. W and Y are responsible for almost all of the near UV absorbance of proteins, with F making a minor contribution (see Fig 1-3 from Creighton). Analysis of proteins by spectroscopic techniques (CD, ORD, Fluorescence, NMR etc) will be covered in the Physical Biochemistry course.) Note also that A280 contains a component that is contributed by disulfide bridges.
= 5540
nW + 1480 nY + 134 nS-S
i.e. in a peptide with a single representative each of these elements, W would account for ~77%, Y for ~21%, and cystine for ~2% of the absorbance at 280 nm.