ࡱ> E@  bjbj +N       8, <(RRL'''''''$(R +' '  '&&&  '&'&X&H'  H'F 1k#NH'''0(H'+$+H'    + H'8"^&L''d$&" LECTURE 2A: SOURCES OF UNWANTED VARIABILITY SLIDE 1 Welcome to lecture block two in our online lecture series. In block two, we are going to talk to you about how to design and implement an expression array experiment so as to identify the genetic underpinnings of human disease. SLIDE 2 The first thing we need to discuss before we get into the details of how to design an array experiment is how to eliminate sources of unwanted variability in expression profiles. You can imagine this as noise in the system that we are trying to rise above and get around so we can identify disease specific correlates. There are three primary sources of unwanted noise. The first is what we will term SNP noise, and this is the normal inter-individual variability that is caused by one SNP every 1300 bases that lead to difference in the way we look, differences in intelligence, etc, all defined as within the normal range of human variation. The next source of unwanted variability is in the tissue itself. Imagine you have two patients, each with the same disease, and you get a biopsy from each of those. You might have a different proportion of unaffected and affected cells in the first biopsy as compared to the second biopsy. This tissue heterogeneity can add elements of noise into your experiment. Finally, the disease itself might be caused by completely different genetic bases. This is a fundamental problem in a lot of human genetics, be it positional cloning or complex human genetic studies like array analysis. You might have two families linked to completely different chromosomal loci and caused by different genes in a Mendelian study, or you might have different genetic backgrounds coming together to form the same disease, for example diabetes, and that genetic background might be different between patients and yet still look like the same disease. Genetic heterogeneity is a significant confounding factor. Defining this noise will be the subject of the first two mini-lectures in this block. How to get around this noise will be the subject of module three. How we obtain a candidate gene list from the arrays will be the subject of mini-lecture four in this block, and how to statistically validate these results so that you can take this forward into a translational setting for diagnostics or therapeutics based on these array experiments and the validation will be the subject for mini-lecture five in this block. SLIDE 3 How do we design an array experiment? How do we assay the specific clinical variable under study without getting noise from the SNPs or the disease heterogeneity or the tissue heterogeneity? Again, the sources of unwanted variables will be primarily of these three types and each of these can obscure the interpretation of expression profiling results. So, this prompts an interesting discussion of what is normal versus what is disease. To a large extent disease is societally defined. SLIDE 4 There is a whole spectrum of normal variation and most of this is caused by just different genetic backgrounds. And, when we say different genetic backgrounds we basically mean the one base-pair out of 1300 that we are all different at. So some people have blue eyes, some people have brown eyes, some are a little taller, some people are a little shorter. All of these things are within the normal spectrum of what society defines is normal, but at some point along this spectrum you reach a point at which society deems you to have a disease, which in fact is again a term imposed on you because of what society identifies as a compromise to your body. And, to some degree intelligence can be sort of an interesting cut point that is easily recognized and you go all the way down to actually complex genetics like diabetes and cancer and asthma on the other end of the spectrum, which you inherit as part of your genetic background, and that is clearly a disease state because it causes your body direct harm. SLIDE 5 So what causes this spectrum? On the left side of the slide you see that normal variation is caused by SNPs and as you move towards the right end of the spectrum the term mutation applies and in general a mutation is a much more devastating event to genomic DNA. SLIDE 6 Mutations could be gaping holes or huge deletions in your genetic code or it could be caused by point mutations, which in just looking at the sequence, look exactly like SNPs, but they will actually change the amino acid coding sequence. They could be out of frame mutations, so theyll turn the resultant proteins into a garbled mess of amino acids. Thus, in general, mutations are more severe events and SNPs are more benign events, but when you have a whole pattern of SNPs which comes together with another pattern of SNPs to form a disease state that is now classified as mutation and falls in the middle of this spectrum. Moving forward to the right of this spectrum, you have SNPs which change the amount of RNA that is produced. They can be in regulatory regions. They change the composition of the transcripts that are produced meaning they change amino acids, and these are generally termed non-synonymous SNPs and these are going to be the SNPs which are going to obscure our expression profiling results. They are normal variants. They change transcript levels within a range of normal which differ from person to person to person and in general dont cause mutations but cause variations between people. Now say you have a mutation which changes the level of the transcript as well. That change will become buried in all of the normal variations. So how do you get around that? And again this is just to reinforce the point that these regulatory SNPs can come together in one combination and form a pattern of normal variation, but when they come together in a non-advantageous combination, say in an offspring, they could cause a disease phenotype. And this becomes very difficult to parse out because it is along the spectrum of normal human variation. So, what we are really interested in doing is identifying those transcript changes which are consistently found in people with a disease state and may sometimes be found in people without the disease state, but if you look across a large number of individuals there is no consistent theme in terms of that SNP noise tracking with a normal or unaffective persons phenotype. So, youll have random fluctuations in the unaffected person or population, but you will have no fluctuation in the mutation. Thus, the take-away message here is to identify expression correlates of disease samples in cohorts that are consistent across the state, and thus cannot be attributed to random SNP noise. LECTURE 2B: SNPs SLIDE 1 I would like to focus this short mini-lecture on contrasting human SNPs to mouse and rat SNPs, and how these are actually quite different when designing expression profiling experiments. SLIDE 2 As we mentioned, the distinction between SNPs and mutations is a gradient from things that clearly cause disease to things that are just normal variation, but the important point to keep in mind here in designing expression profiling experiments is that SNPs can actually influence the expression of disease. For example, a patient might have cystic fibrosis and all the patients might share the same deltaF508 mutations, but they can respond differently to the environment and their response to the environment can be modulated by SNPs. For example, allergens in the environment can predispose their lungs to having some problems with additional asthma, but that asthma is caused by a SNP say in the IL13. So, you just have to be careful in identifying the variables that you are studying. Is it environment? Is it a single mutation that is shared or is SNPs that are influence that mutation? SLIDE 3 So, there are some additional things that we consider quite often when designing human experiments, in addition to SNPs, which are for example sex. Y-linked genes are not present in females, and in fact, when we expression profile males or females, we see the Y-linked genes expressed in males and not in females, which is what you would expect. In addition, many genes are hormone responsive and that depends on the tissue and depends on the gene and depends on the age of the patient. So all those things need to be considered when designing both your control groups and your experimental groups. Then there is age. Gene expression is different between puberty at different points in development both for human, mouse and rats or any organisms you would like to study. Another intriguing example is X inactivation. For example, most people dont realize that if you take monozygotic twin females people think of them genetically identical. They are genetically identical but their expression profiles can often be very different. Why is that? Because X inactivation is not necessarily the same in monozygotic twins. We found a number of female monozygotic twins where one was using almost the paternal X chromosome and the other was using the maternal X chromosome. 10% of the genes then are different in which ones are expressed between those identical twins. So, it is important to even think of things like that and drug metabolism SNPs. What drugs are being used on a patient and is there a variation in the response to those drugs? And finally, identifying candidate genes for SNPs can be a goal of expression profiling. For example, we are doing in our lab a lot of exercise studies on normal volunteers where their muscles are then biopsied, and we look for differentially expressed genes based on different types of exercise. And those differentially expressed genes become candidates for SNP discovery, so we can identify the polymorphisms or SNPs that dictate whether somebody will be a sprinter, or a weightlifter, or a long distance runner. SLIDE 4 So, is it the same situation in mouse in rat? Is it as complicated and are there as many sources of SNP noise? It depends. Most mouse strains are typically inbred. They are generally used a model to reduce genetic variation through sister/brother matings. There is relatively little SNP noise between the same inbred mouse strain. So, you can go get some black six mice and pretty much rest assure that they are pretty much congenic or genetically identical and relatively little SNP noise should be present. However, because of that same brother/sister mating and the congenic nature of the strains, if you look at different strains there has been a great deal of founder effect and there is very different genetic backgrounds then between different inbred mouse strains and in fact there is an exaggerated SNP noise difference when looking at different mouse strains. So for example, on our sample submission form for expression profiling in our consortium, we are emphatically asking what is the mouse background that you are using and are all your experiments done on the same mouse strain because if they are not that could tremendously complicate analysis. Transgenics and knockouts are a bit different with regards to mice. Transgenics, you can assume, are, in fact, a congenic strain and inbred and the control strain is the so called background strain in which the transgenic was produced. This is different from knockouts or knock-ins. The reason is because they are generated in different ways. Transgenics you are adding a gene to a genetic background, so you are injecting oocytes with a specific gene construct and you know the genetic background of that oocyte or that zygote. So, you can get the control strain, the background strain as it is called. Knockouts are generated in a different way with transfection of ES cells and creation of chimeras and part of the process of making knockouts or knock-ins is using different mouse background strains. So, it gets more complicated with knockouts, and it actually is problematic to get congenic strains of knockouts. It is just something you have to be aware of when designing experiments and make sure you have a handle on what actually the background strain is, whether it has been outbred or if it has been inbred back to being congenic with some background strain. Just make sure you ask those questions and know what is going on. SLIDE 5 So, in general with mouse, the experimental design is such that you typically mix the tissue samples from multiple mice and can pretty much assume that those mice are very similar. You assume inbred strains have relatively little SNP noise. Outbred, knockout strains should be mixed to normalized SNP noise and make sure that you try to get a congenic strain. SLIDE 6 Rats are maintained as relatively outbred strains. There are some specific strains of rat, for example, Hooded or Norway brown, but those themselves are outbred. They are isolates that are then outbred, so within the strain, rats can be quite genetically different and between strains they are even more genetically different. This situation in rats is not all that dissimilar to humans where, sure, most humans are considered outbred, but you do have isolated populations that are considered more inbred, whether its Indian Eskimos or the Ashkenazi Jewish population or even Caucasians versus Africans or African-Americans. So, it is similar in rats where they are generally outbred, so you need to get a handle on that and make sure you are studying a relatively good number of rats. Now whether you expression profile them individually or mixed we will go over in the next session. So, rats are generally used as a source of genetic variation to actually study that, and so, as I mentioned, it is often a good idea to mix tissues from multiple rats to normalize out the SNP noise. Unless you are trying to study SNP noise, then you dont want to mix them. SLIDE 7 So, just to conclude this short section, SNP noise for expression profiling, you need to get a handle on your variables and whether that is a reduction of variables or an isolation of a certain variable is up to the experimental design and up to you to keep track of sex, age, strain. Humans and rats have substantial SNP noise that must be taking into account, whereas inbred mouse strains have relatively little SNP noise but be careful of knockouts because they are rarely congenic. LECTURE 2C: METHODS TO OVERCOME UNWANTED VARIABLES SLIDE 1 In this session we are going to go into quite a bit more detail on experimental design. We are going to talk about methods to overcome unwanted variables or isolating variables and particularly address the issue of whether you want to do lots of individual profiles or you want to do mixed samples on a limited number of profiles. We will go over the statistical analysis about the different options within this consortium and show how your experimental design ends up influencing how you can interpret your data in the end. SLIDE 2 So, lets talk about a couple more sources of variability. In the last two mini-lectures, you heard about SNP noise and what that means and how it depends on whether you are studying humans, rats, or mice. Here lets address a couple more variables. One is tissue heterogeneity and noise due to that. Now lets compare and contrast tissues versus cell cultures. Tissues have mixed cell populations and obviously, between different regions of the same tissue or different individuals the same tissue, you can have different ratios of those different cell populations. So of course, your expression profiles will reflect that. You have the additional complication with patients in that you have pathology. So, lets take cystic fibrosis again. In lungs with cystic fibrosis patients you might want to compare them to normal lungs but recognize that there are a lot of differences going on in that tissue, a lot of cellular changes in addition to just the CFTR mutation and the loss of the cystic fibrosis transmembrane regulation protein. You have inflammatory cells. You have bacteria even invading the lung. So, when you are looking at pathological tissue keep track of what is primary and what is secondary and what you want to study in that experiment. Now it is tempting to say that cell cultures are the best system to expression profile, and indeed you often have a single cell type on the dish. But there is a whole other plethora of variables, and in fact, in looking at these variables we typically discourage people from cell culture experiment, both from our own personal experience and because of a number of variables that are difficult to control. For example, just take serum. If you take different serum lots, say fetal or even neonate or different types of serum, different organisms, those are all a tremendous variable. There is different amount of growth factors, even if you take the same fetal bovine serum. Different fetal bovines are outbred, so you have all sorts of variability within that serum. You have problems with cell density, problems with maintaining temperature and CO2. And in general, we find that maintaining all those extrinsic variables in cell culture can be quite difficult, and so often we go back to the tissue to look at primary instead of secondary problems that you cant control. SLIDE 3 Lets talk about one other thing which is noise due to the disease. So for example, do the patients and animals under study share the same primary problem? An obvious example that was brought up a number of times is with patients with CFTR mutations. They can have the same mutation and the same gene. So, there that is controlled and you shouldnt have much noise, but we went over before about environmental noise and SNP noise. An extreme example, on the other hand, is chronic obstructive pulmonary disease, and you have many different etiologies and many different environmental effects which can cause COPD. So, you have a very heterogeneous population. Chronic obstructive pulmonary disease is considered genetically heterogeneous, so you have a lot of noise due to the different disease mechanisms. Whereas patients with CFTR pretty much are considered genetically homogeneous, where all patients involve mutations of the same gene. So, in this paradigm it can be okay to mix CF patients to normalize noise, as far as tissue noise and SNP noise, but it can be very problematic to mix COPD patient tissues because you dont even know if they share the same primary etiology. Often in profiling experiments we search for the underlying genetic heterogeneity these are termed subclassification experiments. SLIDE 4 So show me the noise. Where is it? Give me an example. Show me a how you design an experiment and how you account for different sources of noise. So, here we are going to use an example of muscular dystrophy. You can also see a written example, which is downloadable from this site as the Sample Proposal Form. Now muscular dystrophy patients have defects in muscle tissue. There is usually a single genetic mutation leading to a single biochemical defect. Now muscle has some advantages. It is relatively homogeneous. It is about 50% or 30% of your body mass, and as you can tell from going to the supermarket with meat, when you buy different cuts of meat it is still recognizable as meat. So, it is somewhat homogeneous at least. It is generally flash frozen which is nice because the standard pathological preparation for muscle is just immediately taking it from a patient and as soon as possible putting it in an isopentane cooled liquid nitrogen and flash freezing it very quickly which is ideal for RNA preparation. Now lets take a specific example, Duchenne muscular dystrophy, the first positionally cloned gene. All patients have mutations in the dystrophin gene and are lacking dystrophin in their muscles. So, like cystic fibrosis, we have a genetically homogeneous population under study. So, in this case, we would like to match for age of the patient because the disease is progressive so we want to limit our studies to one stage in the disease. We want to control for sex. In this case, Duchenne dystrophy generally affects only males, so we make sure everyone is a male. And, then we want to make sure the same muscle is sampled. SLIDE 5 So, how much noise is there in this system and where does the noise come from. So, if we look at this, what we have done is taken a single muscle biopsy from a patient, divided it into two parts, and then expression profile those individually. So, this graph here is a scatter graph. On one axis is profile from one muscle biopsy and on the other axis, the y-axis, is a profile from the second piece of the same muscle biopsy. And, you can see that things line up pretty well. So in this case, you have two different regions of the same biopsy that give similar expression profiles. However, lets take a different patient and do the same thing. And, here is the same experimental design, but simply done with a different patient. We again have two different regions of another biopsy, but in this case look how different the profiles are. You see considerable scatter. So, what this analysis tells us is that tissue heterogeneity may or may not be substantial source of noise in the interpretation. If you are lucky, you can take two regions of the same biopsy and find very similar expression profiles which imply that the cellular content is very similar between those two pieces. However, a second patient, we did the same thing, and you find very different profiles. So, tissue heterogeneity can be a significant source of variation. SLIDE 6 So, how do you visualize the source of variabilities? So, here what you typically do is what is called cluster analysis or hierarchical clustering, and you define the extent of sharing between different profiles. Now in this case I am going to show you unsupervised clustering where we are not telling the program that there are any variables between the samples that we know about. We are just saying, Here are all the profiles from a bunch of different Duchenne muscular dystrophy patients. Some of them are mixed. Some of are individual. Some are just duplicates, but tell me what is related to each other. So, the software itself in unsupervised clustering determines which profiles are most closely related. So, here is a large series of profiles from Duchenne muscular dystrophy patient muscle biopsies. One I want to point out here first is in one instance we took the same RNA sample, the same hybridization cocktail, and just put it on two different arrays. And that is these 3A-D for duplicate and 3A. Now you see the branches of this dendrogram, the lower the branch point, the more closely related the samples are. And, if you look across all these profiles, the arrow here is pointing to two profiles that are very highly related. And, this isnt generally what we find. We find that the actual procedure, the actual hybridization of the chip, the scanning of the chip, and the use of the Affymetrix algorithms to determine if absolute intensity, the level of variability is very low and based on hybridization intensity is extremely reproducible. And in fact, it has gotten quite a bit better over the last couple years. Those Affymetrix production facilities have been upgraded, and the chips have become more and more consistent. So in general, we find that experimental variability, as far as the process of the experiment, is very low and that is shown here in these particular duplicate arrays, which is what we find in general. SLIDE 7 Okay, what other sources? Now here as I mentioned previously, we have taken two different regions of the same biopsy as shown in the scatter graphs. Here we are seeing profiles of 6A and 6B, which is patient six, one region of the biopsy versus the second region of the same biopsy, and we find that the branch point is relatively low. And in fact, the program looks at all these profiles and says, Yes, these two profiles are highly related and we say, Good, because they are from the same patient so we were hoping they were highly related. This also suggests that at least in this patient, there isnt much tissue heterogeneity. However, we can look at two different regions of the same biopsy of a different patient, and in fact, as shown here, these 1A and 1B profiles, again from different regions of the same biopsy, actually show up in entirely different regions of the dendrogram. In fact, they are branched all the way up at the top even before control individuals that are normal are branched. So, this is the same sample corresponding to the dendrogram previously, and it can show that depending on your patient, depending on your tissue, it can either be a huge source or very little source of variability in your expression profile. SLIDE 8 So, lets continue this analysis and continue looking at this dendrogram. Now, if we divide five biopsies from five different patients into two parts each. We mix equal amounts of the five biopsies into two pools. Keep in mind that the two pools here are from the same patients, but the two pools are derived from different regions of the biopsies. So, there is actually no RNA or tissue shared between the two pools. So then we hybridize each pool to an individual chip in this case. So, where do those show up? Well that is the arrow here and in this case it is control individuals. You can see that even though we used different regions of the biopsies the branch point is extremely low. It is almost all the way on top of the profiles, and this suggests that when youve mixed you have in fact normalized SNP noise, because there are multiple individuals here, and you have normalized tissue heterogeneity. And, when you think about it that makes sense. You are now taking many different regions of different biopsies and mixing them all together so any variability due to tissue heterogeneity should be normalized. You are basically averaging it all out and the same with the multiple individuals. Any SNP noise inherent in this population of patients with Duchenne muscular dystrophy patients is also normalized out. You can see that that has normalized out all that SNP noise. Mixing does normalize out all these different sources of variability. Of course, a critical point here is that if you are focusing your study on tissue heterogeneity or if you are studying SNP noise you definitely dont want to mix patients because you are effectively normalizing that all out. SLIDE 9 Okay so what is appropriate? The bottom line is it depends. It depends on what you want to study. If you know the primary variable where we know mutations in the CFTR gene cause CF or in the dystrophin gene cause Duchenne dystrophy and we only care about what is shared and not what is different between these genetically homogeneous patients, then it is perfectly appropriate to mix samples. It saves cost, time, and samples. If you are interested in variation between patients then certainly this is not appropriate. It is statistically much more powerful to do many individual profiles. Now there are some analytical considerations that must be kept in mind. In other words, once we generate either mixed profiles or individual profiles, how we can analyze that data statistically and functionally in the end, will differ depending on the approach. SLIDE 10 Now one method that we have used quite extensively is iterative or pair-wise comparisons between small numbers of profiles. Now we have a very extensive description of this protocol, and it is published and we refer you to the website where there is extensive amount of text and examples and data that shows the output of this and the website is given here. It is microarray.cnmcresearch.org\PGA.hgm. Basically, what we are doing is we are taking a low number of profiles, lets say mixed control profiles, just two of them and mixed Duchenne muscular dystrophy profiles, just two of them, and with those two controls and two Duchenne profiles it is possible to do four iterative comparisons. We can do control one versus Duchenne one or control one versus Duchenne two, etc, etc. You end up with four different comparisons and we simply say, Show us all genes that consistently have fold-changes greater than two. So, two fold or greater changes in gene expression levels between these four iterative or pair-wise comparisons. So, the output then is average fold-changes. You do not get P values. A big important variable to recognize when you are doing analysis by fold-changes is that you are creating a ratio, a ratio of expression level between a control profile and a specific gene in a control profile, and a muscular dystrophy or experimental profile. If a gene is not expressed significantly above background in one or the other profile, that low level of gene expression becomes your denominator in the ratio. So, you end up with a five fold increased expression say in you experimental. In which case, your control becomes the denominator. When your denominator approaches zero or background levels recognize that this yields an unstable ratio. As you know from high school or even junior high in math, if you start dividing by zero your ratio goes towards infinity. So, because of that, the lower your denominator gets, the closer it gets to zero, you start generating an inaccurate and exaggerated ratio. Fortunately, Affymetrix software takes that into account and will report out what is called a tilde sign, which says, Look, your denominator is approaching zero. One of these profiles is right at background or even below background. There is definitely a fold-change. You are getting a huge increase in one relative to the other, but take the fold-change with a grain of salt because we are dividing by zero. You are making things go toward infinity so be careful of this. SLIDE 11 Data analysis of large numbers of profiles is a different method. You obviously cant do iterative comparisons if you have 30 profiles. It would generate an exponential number of comparisons, and it gets mathematically complex. So, in this case with large number of profiles, you generally want to export absolute analyses, which in Affymetrix are called signal values, import these into some other data mining tool such as GeneSpring, SpotFire, or even Excel, which are ones we commonly use, and then you generate P values between groups of profiles. So, if we take say 10 Duchenne dystrophy profiles and 10 normal profiles we have enough data there to actually generate P values. How significant are the fold-changes or the differences in gene expression for each individual gene? GeneSpring typically does not give you fold-changes. You actually have to do some additional analysis to say, Okay, we have a highly significant P value to this particular gene but what does it mean as far as fold-change? Another thing which we will go into in subsequent mini-lectures is that when you have a large number of profiles, temporal clustering becomes incredibly powerful approach, so in other words, generating a series of profiles at different time points after a single stimulus or variable. For example, one experiment that we will go over later that we have done in our lab is to induce muscle degeneration in mice. Once you have induced muscle degeneration experimentally, you then harvest tissue at defined time points during the regeneration, necrosis then regeneration, and then you profile each of those time points, enter then into your software of choice, and develop temporal clusters. Now we wont go into it in this lecture but be prepared for that for next lecture block, and that is an incredibly powerful approach to define coordinately regulated genes, genes down-stream of transcription factors, new down-stream targets, all sorts of things, and we will go over that later. SLIDE 12 So one or two last slides. What is the sensitivity or specificity of mixed versus individual profiles? So here again is another example. We take five Duchenne dystrophy patients, two regions of the same biopsy, all profiled individually, and take these ten different profiles then and put them into GeneSpring to generate P values relative to similar control samples. On the other hand, we take the same RNAs and mix them into two pools. So, we only have two profiles instead of ten, but now we are going to do the iterative fold-change or pair-wise comparisons that generate fold-changes. Now lets compare these two approaches. Remember with GeneSpring you are going to generate P values between two groups of profiles whereas with the mixed pools of samples you are going to do pair-wise comparisons and generate fold-changes. Individual profiles are shown to the left here and if we take those ten different profiles, which is five Duchenne dystrophy all done in duplicates and compare them to a series of normal controls by GeneSpring, we end up with nearly 1500 genes that have a P value of less than 0.05, very statistically significant. Now another point here is notice since you are doing so many comparisons you can actually have quite a few genes that randomly will show a P value of less than 0.05, in fact roughly 5% of all the genes under study. So, there should be some noise in this analysis at P less than that 0.05. So of these 1500 genes, maybe quite a few of them are noise and many of them are actually true differences. Approaches such as permutational p-value testing help to eliminate these false positives. On the right we see the other analysis protocol which is mixed profiles. Here we are only doing two Duchenne and two control profiles but from different regions of the biopsy, many individuals. And, you see many fewer genes that survived these greater than two fold-change in gene expression in these four iterative comparisons, and again this is explained in more detail on the website. But, here you see only 400 genes. So, which genes are shared between the two approaches? And, what you see is the large majority of the mixed profile changes are in fact shared with the individual profiles with a P value of less than 0.05. Now the percentages in the center and the right, the 85% and the 15%, refer to the mixed profile. So, we see that 85% of genes detected with greater than two fold-change by the mixed profile approach, remember which is a small number of chips and a mixed bunch of patients, still you can see how specific that is. In fact, very few changes, 15%, are not included in the individual profiles of P less than 0.05. So by this analysis, it seems that mixing is relatively specific but not very sensitive. SLIDE 13 The conclusion here is that mixing is a low cost alternative or means by which unwanted variables, such as in this case SNP noise due to different patients and tissue heterogeneity due to variability within the muscle biopsy, can be normalized. Mixing is generally specific but generally insensitive depending on the P value that is used to cut off and the fold-change cut off that are used for comparison. Experimental design must consider all sources of variability and conduct its profiling experimental design to either normalize or at least account for these variables. LECTURE 2D: SCREENING STRATEGY TO ID SLIDE 1 Based on the large amount of inter-individual variability caused by SNP noise or transcriptional flux that we have talked about in the first or second mini-lecture in this block, in the third module we have put forth a strategy to identify candidate genes which may be correlated with these disease specific processes. Remember that is what we are trying to figure out and get to rise above the noise caused by SNPs. In the fourth mini-lecture, we will talk about validation techniques. How do we validate these candidates at the protein or functional levels, the ultimate test of whether your experiment was a success? We can use a pooling strategy to eliminate noise and get disease-specific correlates to shine through without using really any statistical analyses. SLIDE 2 So, the first phase of this screening strategy is to use pooled samples on very large Affymetrix arrays to identify preliminary candidate genes. In the second phase, those candidate genes can printed down onto glass spotted arrays, and a large number of samples are screened with this inexpensive array on a subset of the tens of thousands genes that we preliminary screen with. Alternatively, we can screen this independent sample set on larger arrays if we have the funds. Now that costs have dropped dramatically for all large arrays this becomes a feasible (and in fact encouraged) approach. SLIDE 3 So again, phase one is candidate gene identification. We take five individuals with a given phenotype and five matched individuals without that phenotype, isolate RNA from each individual, synthesize cDNA and cRNA from each sample, and pool the members of each group of five in equimolar amounts to form pools. Thereafter, each pool is hybridized to an individual Affymetrix array and for example in humans we are using the U133A and B chip set, representing ~45,000 transcripts. For mouse we are using the complete U74 chip set, and for rat we are using the complete U34 chip set, and each one of these is screening tens of thousands of different transcripts in a single hybridization. Again, we routinely identify at least 80% true positives through this strategy. There are certain confounding factors which we have talked a little bit about which make this strategy suboptimal and primarily those would be heterogeneous disease-causing processes. So for example, if there is one primary mutation which results in the same phenotype in three out of the five samples and the other two has a different mutation which results in a similar but non-identical phenotype and both operate through independent processes, basically you will not have a consolidated pathway which is going to rise about the noise, and in fact they might actually negate each other. So, whichever way this falls out, whether you obtain a very nice gene list of candidates, which all make sense or you dont, you know that if you dont there may be some heterogeneity going on and you have only hybridized four array sets so then you can actually go back and assume there may be heterogeneous processes and flush this out with individual arrays. So, as a screening approach we feel this is a very valid screening approach. SLIDE 4 So, the schematic of what this looks like is presented here where five individuals illustrated at the top one through five have biopsies taken. Those biopsies are actually split in half in this example and half of those biopsies go into one pool and the other half go into another pool. And, what this does is basically account for experimental variability and really isolates just the biological noise from the experimental noise and all of these labeling processes are independently done on each sample and finally they are pooled and labeled onto the GeneChips. SLIDE 5 We have discussed the iterative comparison strategy in the previous mini-lecture. SLIDE 6 The second phase is validation. One option is to take these lists of change calls that are illustrated by steps 1,2,3 and 4 and print all those gene probes onto glass slides. And, we have talked a little bit about where those clones sets available from, for example though research genetics you can just send them a list of clones and they will send you back the clones. We can use our cDNA array printer to print those onto glass arrays in larger volumes, meaning we can make a hundred slides at a time, and we can screen a very large sample set or patient set on a large number of arrays and thereby statistically validate that these outliers are true outliers over a large number of samples. And, that is done through the standard correlation analysis, which we will talk about in this subsequent lecture. Again, cost permitting, validation should be done on the largest arrays possible on these new samples so that that data may be archived for additional use in other projects. SLIDE 7 This was an example an array that is currently used in the lab. This happens to be a 22,000 element cDNA array. SLIDE 8 So, the phase two statistics are relatively standard correlation statistics which will be addressed in a subsequent lecture, and there are many different ways you can look at this where you look at standard differences in means between two classes. You can look at differences in the ratio of variances between two classes. You can look at inter-gene interactions for example using relevance networks, etc. But the end goal of the phase two is to actually get a P value for each of the genes on this subset arrays say whether they really are correlated to the disease process or they arent. SLIDE 9 So in conclusion, there are two phases to our experimental design, and this is really based on all of the SNP noise discussion that we have already went through. A very large screen using large arrays can be done in phase one on two pools, a disease pool and a control pool, as a training set. These candidate gene can either be printed onto custom spotted arrays and much larger sample sets screened for statistical validation (validation set) or this can be done on large arrays as well. Finally, of those genes which show statistical correlations, i.e. P values less than 0.05 with whatever correlation statistic we are using, will be shunted into protein and functional validation as discussed next. LECTURE 2E: VALIDATION TECHNIQUES SLIDE 1 Welcome to mini-lecture five in block two. In this mini-lecture we are going talk about how to validate the results that you have gotten through your screening strategy. So how do we that? There are three different levels at which we can validate. We can validate the RNA level. We can validate at the protein level, which is taking those the next step, and we can finally, which is the gold standard, validate through a functional assay to see whether this protein causes the disease or has something to do with the disease. SLIDE 2 So if we look at the RNA level, there are three standard techniques that are available today to really validate the array results. Northern blots are the standard low-tech lab technique which every molecular biology lab has access to and can be very quickly and easily done. Quantitative reverse transcription followed by PCR is a more robust technique in that it requires lower RNA input amounts, but it is also a little more expensive and difficult to do. And finally we can use arrays to screen a large number of independent samples for every gene in parallel. So, now you get validation results on every single one of your change calls in a single hybridization over many samples. So, all of those validation techniques at the RNA level really get at the question of are your array results real and do they hold up over a large number of samples in these two clinical groups. The next step, and this is almost really mandatory these days, is to validate those results at the protein level as well. So, remember RNA is translated into protein and if that protein is not concordantly dysregulated as its transcript was then the RNA results are really meaningless. And so, either by Western blot or by immunohistochemistry we want to say, Yes, this upregulated transcript results in an upregulated protein which has something to do with the disease. Finally, we want to look at that protein which is disregulated and say, If we tone it down or tone it up does it mediate the disease. And, that is the final gold standard for a validation technique. SLIDE 3 So, validation at the RNA level. The first thing well talk about is the Northern blot and again, this is standard low-tech procedure in which total RNA is electrophoresed through an agar-gel. It is size fractionated, it is transferred onto a nitrocellulose or nylon membrane, and it is probed with the gene you are interested in looking at and validating. So, the pros are it is very easy, very cheap, and very straightforward to do. The protocols have been optimized, and the cons are that you need a fairly large amount of starting RNA. So, in each well of your Northern blot you need at least 10 micrograms of total RNA and if you are only working from a very small chunk of flash frozen human tissue, that could be more than you can get. And, in addition, that could be an extremely valuable tissue that you dont want to chew up needlessly. And so, this is where this technique falters in that it is very difficult to make that decision as to whether to use your entire tissue on this validation technique. And finally, it is a low-throughput assay, meaning you can only look at one transcript or one probing of a Northern. And so, this requires multiple stripping and rehybridizations of the same blot if RNA quantity is limiting, and this really is not going to work for hundreds or thousands of different probes. SLIDE 4 So, to get around the limiting amount of RNA input that is required on a Northern blot, you can do quantitative reverse transcription followed by PCR. And, in general, what is done here is you isolate a very small amount of total RNA. For example, one microgram of total RNA, we will reverse transcribe it into a cDNA or a copied DNA strand and use that as an input into a PCR reaction which is quantitative, meaning that given a set input amount of cDNA, you know how much amplification you get out on the back end and you can compare that across tissues. This is usually done with an internal standard for amplification which doesnt vary across the different tissues, and it is really a nice robust technique which requires very little RNA input. The cons are that it is very expensive. You need up to 100,000 dollars of hardware sometimes to do this and each single PCR probe set is fairly expensive to design and use, and this is primarily because fluorescence is involved. So, expensive assays are run, perhaps prohibitively expensive, for hundreds of differentially expressed gene and again it is a single transcript assay. So, it is relatively low-throughput. You end of spending a lot of money and a lot of time validating a lot of genes, if you have a lot of genes. SLIDE 5 What we chose to do was validate at the RNA level using arrays. These arrays can be custom printed or stock arrays. This is a very high-throughput assay, which is the major pro of this validation technique. It is relatively inexpensive, on the order of between 50 to 600 dollars per hybridization per assay, and you need relatively low RNA input. When this technology first came about people were using 100 to 200 micrograms of total RNA and as you just heard that is way too much for a biopsy from a clinical sample. You are going to chew through your whole biopsy and probably need more biopsies just to fulfill that requirement. But recent advances in amplification schemes have resulted in an input requirement of often much less that a single microgram of total RNA. So, we are really getting the best of both worlds here, a very high-throughput and inexpensive assay, coupled with a very low-input requirement for this assay. The only con is that a large amount of hardware is required on the order of about 150,000 dollars and some expertise is required to perform this technology. But that is why you have the NINDS/NIMH Amicroarray Consortium! After you have done your multiple parallel hybridizations on a number of different tissues, you can then input those intensity values into a program and say, Well, in this group of 50 non-diseased samples versus this 50 diseased samples, is there a significant difference in the expression level of one gene? And, that is what is illustrated on this panel on the left of this slide. On the left panel, you see tumors at the top which have genes turned on, represented by green spots. On the right, you see a different disease class of tumor which have that gene turned off and they are red. This is just a graphically illustration of this, this is a hierarchical clustering program that can be downloaded as freeware from Stanford, but what is graphically represented here is a significant P value. There is a significant difference between the expression levels in this gene with these tissues with a P value of less than 0.05 and now you have something hard and fast that you can hang your hat on and move forward into more complex experiments. SLIDE 6 Okay, so now we have validated some genes at the RNA level in an independent sample set. Whats next? We really want to ask a question, Is it true that this dysregulated gene produces different amounts of protein? And, there are a lot of intermediate steps that happen between a transcript being exported from the nucleus and its eventual function. So, we really want to look at the mature protein and say, Is it differentially expressed between the two states as well? There are two ways you can do this. The first is by immunohistochemistry. Simply taking a paraffin or a flash frozen biopsy from a clinical sample, slicing it very thin, putting it on a microscope slide, and probing that with an antibody directed against the gene product or the protein of interest. So, a very elegant technique which gets you at tissue heterogeneity very nicely because you can say, Alright my tumor only comprises a third of this tissue blot and that is the third that is lighting up with this antibody. So, immediately you have an assay for tissue heterogeneity which can lend more credence to your results, or less credence to your results given how it falls out, but basically, a very elegant, very nice technique. Not quite high-throughput in that every tissue section has to be stained with an individual antibody and assayed under a microscope. It is very difficult to get quantitative signals from this type of technology. And, another one of the cons of this is that only about 20% of antibodies currently available work on paraffin sections and this is what hospitals are full of. The vast minority of samples are actually frozen as soon as they come out of a patient and you can get almost any antibody to work on a frozen section but only about 20% of them to work on a paraffin section. SLIDE 7 Again, this is a single transcript assay. What we can do though is do this in parallel to make it a high-throughput assay. And so, on the left you see the paraffin left that you have walked upstairs from the basement of your hospital with and a small bore punch can be taken of that paraffin blot from a single patient clinical sample and rearrayed into a recipient paraffin blot which has been prebored. So, now you can actually generate an array with your clinical samples. That blot can now be sliced and hybridized with a single antibody, and you can compare multiple tumors in the same paraffin blot. And, the close up on the right actually shows you that you can maintain histology even though you are taking these very small punches. So, now youve got a high-throughput assay for validating your results. Western blots can be also used to validate the protein level and that gets you at the same answer. Although, it doesnt get you at the answer to whether there is tissue heterogeneity in your sample. SLIDE 8 Finally, the gold standard is validation at the functional level. This is again done on a gene-by-gene basis. It is fairly tedious and low-throughput and really occupies a lot of time in the laboratory, but this is what you ultimately want to do. You want to show that this protein, which is dysregulated, can be blocked or turned on, and that has an impact on the state. And, here this is a figure from a recent paper from our group which shows that in certain brain tumors, you can block an over expressed protein with a neutralizing antibody and have the tumor lose its metastatic potential. So, now we have gotten at the gold standard and now you can actually take that into a clinic and say, Okay, well lets design therapeutics against that protein that blocks metastasis. And, this is a real life example of how array results can get you to the clinical end point of interest which is therapy. Therapy is what you want to do. Diagnostics is the other arm of what you want to do in the clinic and based on previous slides, you see that now weve got a set of genes which tell tumors apart. So, now we can use those sets of genes to go in and diagnose the patient, in this case, even before the metastatic disease happens and tell them whether they can go home and they are going to be fine or whether they need these directed new therapies that we are going to be developing. SLIDE 9 So, the overall conclusions for lecture block two are one, that SNPs or SNP noise can obscure disease specific expression changes and you really need to get around that by validation and specific rodents can either have SNP noise or not depending whether they are inbred or outbred strains and those can also obscure disease specific expression changes. Two, we can parse out the noise and get around it and rise above it through a two tiered strategy. The first is the candidate gene identification phase or training set. So, we start with two pools of clinical samples, five samples in each pool, isolate RNA expression profiles and we a candidate gene list. This list is going to contain a number of true positives and a number of false positives, and we really want to verify which are the real ones and which are the noise that made it through that first filter. The next step is to screen a very large number of new samples on arrays and see if the correlations hold up when blinded to the state of interest. We can then take this more robust gene list forward, validate those at the protein level, and finally show that they are functionally relevant. LECTURE 2F: QUALITY CONTROL In this mini-lecturelecture we will talk about the generation of high-quality microarray data. As you know, microarrays are powerful tools for the study of gene expression. The number of laboratories relying on this technology is constantly increasing and to avoid experiment redundancy, it is becoming a common practice to share microarray profiles in the worldwide web through the implementation of expression profiles databases. In many circumstances, however, the quality of data in these databases is poor due to the lack of quality control. In this lecture, I will list some of the quality control standards that each Affymetrix GeneChip microarray here at Childrens National Medical Center in Washington, D.C. needs to meet before it is made public to the scientific community. Quality control checkpoints are applied at different levels during the generation of microarray data, and the ones we mostly take care of are shown in this slide. Now lets see them one by one. Regarding RNA extraction, we generally prefer to extract total RNA instead of messenger RNA to minimize the loss of transcripts during the procedure. The minimal amount of total RNA that we work with is six micrograms when doing a single-round of amplification/labeling, and its integrity is generally analyzed in a 1% agar-gel. Intensity and size of the two ribosomal bands is the main characteristic that we look at. If the sample does not show good ribosomal bands, it is discarded. The next checkpoint is at the level of the cRNA fold amplification. During this reaction using a T7 RNA polymerase promoter the cDNA is biotin labeled and amplified into cRNA. The cRNA amplification rate is measured and samples showing amplification rates below four are not accepted. In addition, what this slide shows is the use of replicates. In our lab, generating replicates means that each tissue is divided into two parts and each part is analyzed on a microarray. As you can see in the slide, the first biological sample, 1A, is divided into two parts, 1Aa and 1Ab. These two are the replicates of the same biological sample. The reason for doing so is to build a statistical strength so that we can compute variances and P values based on real biological replicates rather than using the error model. The error model is described in section three of GeneSpring tutorial. When replicates are generated, we also calculate the correlation coefficient or R2 value. This is done by loading into excel the values deriving from each metrix file and based on the results we decide to keep or discard the microarray. Acceptable R2 values vary depending on the tissue used. For microarrays deriving for example from inbred mice, the minimal acceptable value between replicates is 0.98. For any other tissue we dont accept values below 0.95. Another checkpoint is to look for saturated probe sets. Each microarray, according to protocol is stained with Streptavidin twice, staining one and staining two. The second staining, staining two, is done to enhance the signal and detect a low abundant transcripts. However, it can lead to a saturation of the probe sets. The saturation is generated by what we call the seaming effect that is the situation in which the signal intensity measured by the laser is not correlated with the abundance with the real biological transcript, and this happens when the signal reaches its maximal intensity. The only way to check for saturation of probes is to plot a scatter graph with the data deriving from the first and second staining. As you can see in this slide, in the upper right corner, each dot represents a probe set or a gene and for some of these the similarity between the first and second staining is lost. Some dots are in fact outside the boundaries, and this indicates the saturation effect. The first and second staining are represented respectively on the x and y axis of this graft. Identification of saturation probes is very tedious work and currently an automated script has been developed in our center for a fast and efficient detection, and which can be downloaded from this portal site. Lets talk about scaling factors. Scaling factors are computed by the Affymetrix software and are based on the average intensity and global scaling values. This is one form of normalization, linear normalization to be specific. The total intensity of the entire array is determined and then each gene multiplied by the same scalar to bring every chip to the same total intensity. Another excellent algorithm for scaling is available in the dCHIP suite of software, which is downloadable through this portal. In each microarray, the signal of each probe is increased or decreased to target intensity, and this is done to get microarrays that are reciprocally comparable. The decrease or increase dont alter the differences across genes in a microarray, but simply adjusts them, multiplying by a scaling factor to which the target intensity set by the user. Experiments done here at Childrens have demonstrated that only with scaling factors valued between 0.5 and 5 are expression profiles reproducible. The last checkpoint I am going to talk about is scrubbing of data. Scrubbing the data means to remove all the probe sets that did not change across time points in your experiment, or that are not expressed across all profilies. This can be done in GeneSpring, and we have also written a script which is downloadable here to do this. It is entitled Array Data Manipulation. After doing all of these quality controls for Affymetrix arrays, we have found the technology to be extremely stable, reproducible, and transportable. Well, that is all for quality control procedures so far. Please consider that the above mentioned quality control checkpoints are based on our experience and other laboratories might compile different ones to best fit their purposes. Page  PAGE 2 of  NUMPAGES 16  Page  PAGE 1 of  NUMPAGES 16 23:;! )    ,3 # 6"I"J"R"##&&.../3/88999>>@ɷݷݷݷݠݎݎݠݠ݀ݎݎݎݎݠݎݎݎhkhk5OJQJ^J#hrhk5CJOJQJ^JaJhkhk>*CJaJhkhTCJaJ#hkhk5CJOJQJ^JaJ&hkhk5CJOJQJ\^JaJhkhkCJaJhkhk5OJQJ\^JhkCJaJ223; ! )   gdk 6"I"J"R" ###&&&...888999>>>@@9@:@B@PBQBgdk@9@:@B@QBYBJJuK}KPPW&Wb\j\ ddhhooqqrr||Zcˑ̑ԑהߔ4<̘Ҙ %VYAIڠ۠ܠݠĤŤ5洬ԕhr5CJOJQJ^JaJhTCJaJhrCJaJhkCJaJhkhk>*CJaJhkhkCJH*aJ#hrhk5CJOJQJ^JaJhkhkCJaJhkhk5OJQJ^J:QBYBtKuK}KPPPWW&Wa\b\j\d ddhhhooorrr|||gdkYZcˑ̑ԑ֔הߔ34<@AIڠ۠ܠݠŤgdkŤ67?Z\~gdk567?\~ ILuxwxy:;⒇⒇{hkhkCJH*aJhkhrCJaJhkCJaJ#hrhr5CJOJQJ^JaJhrCJaJhkhk>*CJaJ&hrhk5CJOJQJ\^JaJhrhk5OJQJ^JhkhkCJaJ#hrhk5CJOJQJ^JaJ0 xy$a$gd9 gdkƳvn[$h5hr0JCJOJQJ^JaJjhrU h#hrCJOJQJ^JaJ)hT0JCJOJQJ^JaJmHnHu-jh#hr0JCJOJQJU^JaJ$h#hr0JCJOJQJ^JaJhrhkh CJaJhkhrCJaJhkCJaJhkhkCJaJhkhkCJH*aJ! gdk$a$gd5$a$gd $a$gd# 诫hkh CJaJhThr h5hrCJOJQJ^JaJ)hT0JCJOJQJ^JaJmHnHu$h5hr0JCJOJQJ^JaJ-jh5hr0JCJOJQJU^JaJ) 01h:p5/ =!"#$%,DdZ  C 6ACNMClogo4colorR+fk]:gG+DF+fk]:gGJFIFddDuckyKAdobed     ^5  !1AQa"2qVW R#B3t%7brCcs$d&Fv !1ASQaq"RT2B3b#crs ?NP @(P @(P @(Py{mSsM2"թͩ"KoX8u-;8 ]U<ބk.ۦxSJ|D>S|yE+]ړ128y>ܽ #p!5%l#HT$^\n>3˛wY/pD;BCZ>"`Bܯ(ZZLW \ڏ^\ C$cDV宴1.\Rz5Ah,"Ap//a&nMP+i? ϓ%:#xw i80sp߶܍XQq顩aZzòԟP @(P @(P @(P6E㒙T:,?EbjI%VUI%8O治R.$E z\SHb5x}d}ùX+<3M~vz-#`G P,"{q|usMIܹ<|t,‹h8VUmlZ[!_Bx3hmc8VJ=P"ڱ` t!t|(!}3_T5Zjt"^2FZv먶?TAw%78jPqXjP֎pFA̎PC^[2e۴͹vݑlyq[ yCQTH]MW7"=!q-_8MN(5nkv/*u5^R<8$u;n^%jZʑOͬ3Xź95 cr,2ajdYIE,Ѣ NHe"qj#'Iyi<2 Fen zЎnlQPՋeZKQ-#Yy# ”qw$'e6jڗa3<c/vӢH$E1Nb8Uq%l<$&A2wSBF,">K6PXThg,yw*d\3q{ ]BTeW@ʝι5 P0{ f"ܹ'4jnRXDU a7Dn) ܈jxw?ˁYjZ ŝ6v"x D:k1L@cSxJEt BiY4(?>jD*0 /=nbLbP(wڭ欋) 1 1mojFId8ꊷP-&+%ސKpJq\MaP!0f7Sj]I!!#57SVt Y[pYdTBbs6܁v'#xAjo'5F dAfD|CE֋O! `k&jR1bRrrј d0\TҰm,<"jǵW%O%#2N4L-3Ħ.MU̮n* }Vn5uXi*Mms Fc?0P/İ_ qS u7#Ryl;ͨ$qbSאEc(.JeU4C֯jZowB ;!NbK>Idr U0Aћ8h A0HTc b8%@Ό4R3zb&R7{j6WX" p,LxoRtRHRĺ;6kU6H7hE8"[R@~@J5'6;7&庠 ~%0i5FRIt*Q"1L foJ\ao * f"Q GjҚY+2DGHI" !CQpEkLC{*])P\04;(],,1F w Jۍ›*. ٹ[B@@ @iӧjp6q͂ n&$*d(߁JhJSnvnŗXQe9scQb<Gw7TcԺI kEj;+8ݥp}O\g(cNć.s~Z?QG{w,?W$>>.is>v;ӹa!>qwKQƝlj󏋽\݄~4X~'3wrqC|v9A6!ŀ~ȀwaO/_SὈ!>swK_a?-q;S{ kEiܰ\x8ݥp}O\g(cNć.s~Z?QG{w,?W$>>.is>v;ӹa!>qwKQƝlj󏋽\݄~4X~swK[a?-q;S{ kEiܰ\x8ݥp}O\g(cNć.sٝCcAAAE} )+G0"]vդm{U}iTϥޒ;[s٭vv]馪8}OkoƲ'G[Ī|Pq_V1iQZDii{Y-?ȩT}`*O3#h/#SG]^Ymdwobef^ y&v,eLt`L N=o{NbOjJcem/Ftplس|X+;C):E?[I~wܳOTͮBK HwA tm38um[kb 'U#l]gfGfUmtw 6$4_FȚA k-kѱ٧F/pK3ix$^ 3*?j5HI 0 čkڴYvja/#Q (Y\PMBe~Y@ yFpdH\;|2}Ώ9x!(lV؝NjiHbgj(b?X[o12p;݋t lnR) rGگT)'ԗS3Rd# ޱ~ۦEƷ̹S@odj.kKN,W=\]TT# JkC7*l`Q/]7Al=[?y\gYDysFN!ǿ:G(J}WF80p2|ixLЦ@PMY2+أf|.oa1CsVfC>KY187?-Tb6~Xql:t5ۍSJ&m?+?N{7xiRV&1 1ʉN%OE*n?O)|7 osO5WTMv<VmV)R3p>j* b { l{%nTEe4 &r!)fVmH4[ *kӡFqv{1'*8hxrWf"ixun'6p:/dzlc#WR<63Bbj.P̸p[WjT Y/QPQ АئM`)Pt{(Tv͍f0nЮ"LN̪|6j[EK,b:9ۼw A:R8~IZT ]p'7<8A=>%c9PXc Hp)&Q(,MM!q9 ¿ǧt |@Ar g !W 4H<0bR8 \'X|d[#d1Ief)ffLɘy'.W+isl3S=^v %L#0) 5(mgs R6*`k~mC=$m@?GQv/`5j}TZ(*g;SךMZ-BN_0Y49Z<ڮ#T^R%c#52O}rg@YH6FlmRfag/sD⤞+BEm2+.'*g0=+_j=,TpXaCd{@ NR<) A`PD~vJA';z6Mј': K&LN7[Q+uu?= ޤ:>@.bM"R547Qn`\RT:=b(B dw 7jSkYQP\<򂀙 CWT2>#2n,|bIYT TOyPzh*[tE+ldԚ[3bA*DV|?8D ^fv7T/z}3IW'snZjt~N Y$` 0\?/?ot1V]ZjqόQ/BB 0tvY*?A{LD`"r" \U9Kn5Q6&>^ i&tmuoCפski驺KJKizrR DûXP<+G᭧7Zûx>i6/񼑊LDiep^ɶ֊|;WVGOFS[O$ Cu{+gʯ`rknz!EϵHm)~o"\|Eͬ;{m!M/*pŸ;ɱ7(GkB/9H&QT]I$DqV_Ozr-ۂoՃ1Kk,#:mKlT̓E'ɵs- K0z4N͍j ݦ ۓ7,$;}{ϔުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeo=C}JzwXR(ުeoc[d-2( S vApXXJ}A{Xm-FrO&dwb]e]UQ<ؑS#1(y/8tQ!%=CCWC2]raTC1~J RϿ(X+ij}pz8_i!r>jͭsM0țsX ( F]\~'̩+ݏ!cew*Ki/+|d0?{j;eo>FIKT[/O߰ђ~!G_ lV2d2^ګ_;zwǐ|6{+|Au7sJջ|g*?,ai[h2wG!\r<[V],/edǩe6лLrĠ6룞'mqb7[ fҞ"to(*󢢭1Lp7ᠣaWsmevV Lz%{6;ַ'oÂP @(P @(P @(P @(Ek!vW+H׫v߅J<#2Ey ((#:cWXMMlּs]8Z=  i)_̋f`<<\"[q3ٰ8VRfõD$&tEe'zt{iv<@j2%L7 .ZtJ yנ]ſO}Id(M*L*-qR线f?+t^qE4_.kt!=z(P @(@@@ kNormalCJ_HaJmH sH tH DA@D Default Paragraph FontViV  Table Normal :V 44 la (k(No List 6U@6 Hyperlink >*B*ph4@4 9 Header  !4 @4 9 Footer  !.)@!. # Page Number  23; !) 6IJR &&&0001116668898:8B8P:Q:Y:tCuC}CHHHOO&OaTbTjT\ \\```gggjjjttt|||YZcˉ̉ԉ֌׌ߌ34<@AIژۘܘݘŜ67?Z\~ xy 0000000000000000000000000000000000000000000 00000000000000000000000000000P000000000000000p00000000000000000000000000p0000p000000000000000000000p00000000000000000000p0p0000000p00p000@0z00xH@0z00x0j@0z00xhj@00y 8i0zz&)MP@5 z~QBŤ {} | #.57<GJP!!   D t T  @ Tb  11]]fnxx22emttB*urn:schemas-microsoft-com:office:smarttagscountry-region9*urn:schemas-microsoft-com:office:smarttagsState8*urn:schemas-microsoft-com:office:smarttagsCity=*urn:schemas-microsoft-com:office:smarttags PlaceType=*urn:schemas-microsoft-com:office:smarttags PlaceName9 *urn:schemas-microsoft-com:office:smarttagsplace    js  n x EI15swFJ/3RVNRDH?C*.mq""""###;$?$$$k%s%@&D&''V(^(((9+D+}++++,,,,3-9---..v/}///11n1v111"2)2K2R243;3/46478F>S>BBHHALKLLLLM1M;MQM[MNNTT2V:VaWiWXXYYZZ__3`=`?I$.#'-1EO'3$DOgq  """"''22LL?XDX[[||||\}c}4bݘVZӠGK۪R^ʿԿ%6 3333333333333333333333333&&jTjT\\ttccۘݘĜŜ>>wx ݘ DStephan SBrautigam SBrautigammwood r9 5TAk#@ݘݘݘݘ @UnknownGz Times New Roman5Symbol3& z Arial"qhL#jS#j"w"w!24GG3QH(?9 2LECTURE 1A: SCALE OF THE GENOME AND TRANSCRIPTOMEDStephanmwoodOh+'0   0< X d p|3LECTURE 1A: SCALE OF THE GENOME AND TRANSCRIPTOMEECT DStephanA: SteSte Normal.dot mwood.d3ooMicrosoft Word 10.0@0@&k@ k"՜.+,0  hp  CNMC wG 3LECTURE 1A: SCALE OF THE GENOME AND TRANSCRIPTOME Title  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry F 1kData ,1Table+WordDocument+SummaryInformation(DocumentSummaryInformation8CompObjj  FMicrosoft Word Document MSWordDocWord.Document.89q