Federico Stefanini

Paper #: 98-05-042

Current techniques of DNA analysis enable the scoring of hundreds of electrophoretic bands using a small amount of DNA from an individual. The sample size in pilot experiments is typically orders of magnitude smaller then the number of bands contained in a molecular profile. Nevertheless, it is important to reduce the number of scored bands by identifying those subsets that contain useful information. The identified informative components can be further characterized using samples of larger size, so that experimental resources are not wasted. Highly informative profile components increase the ability to assign an individual whose membership is unknown to its actual group according to the scored band pattern. The features of this class of experiments make the use of standard statistical modeling involved. We propose the use of a Genetic Algorithm to obtain all the profile components that are highly informative given the available data. The theoretical justification of the objective function used by the Genetic Algorithm is based on Bayesian predictive arguments. Results from simulated datasets show that the implemented procedure is very promising for the identification of highly informative molecular profile components within the large set of observable bands produced in pilot experiments. Thus, Genetic Algorithms may be used to reduce the complexity of modern experiments in applied genetics.