Well-to-well contamination in the BAC library plates
Potential contamination was assessed by finding all instances where two or more BACs in a contig came from the same plate in a BAC library. 7556 such cases were found in 824 of the 1983 contigs in the physical map. These fell into three classes:
- 5613 cases where 2 BACs came from the same library plate but were not in adjacent wells in the plate.
- 293 cases where the 2 BACs were in adjacent wells.
- 1650 cases where there were >2 BACs from the same plate in a contig.
We did two tests to help decide if these potentially contaminated BACs should be removed from the FPC assembly.
1. Direct measurement of well-to-well contamination
Single sequence reads were attempted for both ends of each BAC in the two libraries. Within each library every BES was compared to all others using BLASTN (e<1E-199, bit score >300, >=90% of the query sequence covered by the sequence similarity). Instances where BES from adjacent wells matched at this level were recorded and used to build a contamination map for each plate. Since we assume that these BLAST matches indicate actual contamination, we are removing these BACs from the contigs.
2. Simulations to assess the probability that two BACs from a plate could be in a contig by chance
We simulated 10,000 times a sampling of the BAC libraries to generate contigs with different number of BACs. The actual simulation parameters were chosen to closely match those of the actual contigs. As expected, our results show that the probability of two unrelated BACs from the same plate being in a single contig is proportional to the number of BACs in the contig. As shown below, even for relatively small contigs the probability of two unrelated BACs being found in the same contig is substantial. For this reason we have decided for now to leave BACs from the same plate but in non-adjacent wells in the FPC contigs.
|
| Jackson (Purdue Univ.) |
overgos derived from soybean ESTs, selected genomic sequences and genomic sequence around SSRs | |
| Shoemaker (Iowa State Univ.) |
overgos derived from soybean ESTs and selected genomic sequences; SSRs from Composite Genetic Map and newly identified in BACs assigned to contigs | |
| Stacey (Univ. of Missouri) |
STSs derived from selected genomic sequences; SSRs from Composite Genetic Map |