Construction and improvement of the soybean Willimas 82 physical map is an ongoing project. This page describes some of the methods that are being used to improve and extend the initial FPC assembly.

Well-to-well contamination in the BAC library plates
Potential contamination was assessed by finding all instances where two or more BACs in a contig came from the same plate in a BAC library. 7556 such cases were found in 824 of the 1983 contigs in the physical map. These fell into three classes:
  - 5613 cases where 2 BACs came from the same library plate but were not in adjacent wells in the plate.
  - 293 cases where the 2 BACs were in adjacent wells.
  - 1650 cases where there were >2 BACs from the same plate in a contig.

We did two tests to help decide if these potentially contaminated BACs should be removed from the FPC assembly.

1. Direct measurement of well-to-well contamination
Single sequence reads were attempted for both ends of each BAC in the two libraries. Within each library every BES was compared to all others using BLASTN (e<1E-199, bit score >300, >=90% of the query sequence covered by the sequence similarity). Instances where BES from adjacent wells matched at this level were recorded and used to build a contamination map for each plate. Since we assume that these BLAST matches indicate actual contamination, we are removing these BACs from the contigs.

2. Simulations to assess the probability that two BACs from a plate could be in a contig by chance
We simulated 10,000 times a sampling of the BAC libraries to generate contigs with different number of BACs. The actual simulation parameters were chosen to closely match those of the actual contigs. As expected, our results show that the probability of two unrelated BACs from the same plate being in a single contig is proportional to the number of BACs in the contig. As shown below, even for relatively small contigs the probability of two unrelated BACs being found in the same contig is substantial. For this reason we have decided for now to leave BACs from the same plate but in non-adjacent wells in the FPC contigs.

            
# of BACs
in contig
% simulation runs w/o
2 BACs from a plate
% simulation runs with
2 BACs from a plate
299.670.33
398.941.06
498.021.98
596.423.58
694.685.32
792.837.17
890.99.1
987.8612.14
1085.514.5
1278.3621.64
1472.3227.68
1664.5535.45
1857.8642.14
2050.5149.49
2436.4163.59
2825.0674.94
3215.7884.22
405.2894.72
481.498.6
560.2799.73
720.0199.99
880100


Assessing quality of contig assembly using BAC-marker associations
We are using the depth of coverage of markers in contigs to statistically assess the likely correctness of the FPC assembly. To do this we compare the number of times overlapping BACs are hit by a given marker. Although this analysis relies on the unproven assumption that all BACs that could be hit by a marker were identified, it never-the-less gives us a relative confidence measure for comparing different FPC assemblies. This process will be ongoing as more markers are assigned to BACs.

Anchoring BAC contigs to the genetic map
A number of labs are actively working to anchor BAC contigs to the genetic map:
   Jackson
 (Purdue Univ.)  
overgos derived from soybean ESTs, selected genomic sequences and genomic sequence around SSRs
   Shoemaker
 (Iowa State Univ.)
overgos derived from soybean ESTs and selected genomic sequences; SSRs from Composite Genetic Map and newly identified in BACs assigned to contigs
   Stacey
 (Univ. of Missouri)
STSs derived from selected genomic sequences; SSRs from Composite Genetic Map