<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-334</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Xu</snm>
               <fnm>Hua</fnm>
               <insr iid="I1"/>
               <email>hua.xu@dbmi.columbia.edu</email>
            </au>
            <au id="A2">
               <snm>Markatou</snm>
               <fnm>Marianthi</fnm>
               <insr iid="I2"/>
               <email>mm168@columbia.edu</email>
            </au>
            <au id="A3">
               <snm>Dimova</snm>
               <fnm>Rositsa</fnm>
               <insr iid="I2"/>
               <email>rbd2107@columbia.edu</email>
            </au>
            <au id="A4">
               <snm>Liu</snm>
               <fnm>Hongfang</fnm>
               <insr iid="I3"/>
               <email>hl224@georgetown.edu</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Friedman</snm>
               <fnm>Carol</fnm>
               <insr iid="I1"/>
               <email>carol.friedman@dbmi.columbia.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biomedical Informatics, Columbia University, 622 168<sup>th </sup>St, New York City, New York, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biostatistics, Columbia University, 722 168<sup>th </sup>St, New York City, New York, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, 4000 Reservoir Rd, Washington DC, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>334</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/334</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16822321</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-334</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>26</day>
               <month>1</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>05</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Xu et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Word sense disambiguation (WSD) is critical in the biomedical domain for improving the precision of natural language processing (NLP), text mining, and information retrieval systems because ambiguous words negatively impact accurate access to literature containing biomolecular entities, such as genes, proteins, cells, diseases, and other important entities. Automated techniques have been developed that address the WSD problem for a number of text processing situations, but the problem is still a challenging one. Supervised WSD machine learning (ML) methods have been applied in the biomedical domain and have shown promising results, but the results typically incorporate a number of confounding factors, and it is problematic to truly understand the effectiveness and generalizability of the methods because these factors interact with each other and affect the final results. Thus, there is a need to explicitly address the factors and to systematically quantify their effects on performance.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Experiments were designed to measure the effect of "sample size" (i.e. size of the datasets), "sense distribution" (i.e. the distribution of the different meanings of the ambiguous word) and "degree of difficulty" (i.e. the measure of the distances between the meanings of the senses of an ambiguous word) on the performance of WSD classifiers. Support Vector Machine (SVM) classifiers were applied to an automatically generated data set containing four ambiguous biomedical abbreviations: <it>BPD</it>, <it>BSA</it>, <it>PCA</it>, and <it>RSV</it>, which were chosen because of varying degrees of differences in their respective senses. Results showed that: 1) increasing the sample size generally reduced the error rate, but this was limited mainly to well-separated senses (i.e. cases where the distances between the senses were large); in difficult cases an unusually large increase in sample size was needed to increase performance slightly, which was impractical, 2) the sense distribution did not have an effect on performance when the senses were separable, 3) when there was a majority sense of over 90%, the WSD classifier was not better than use of the simple majority sense, 4) error rates were proportional to the similarity of senses, and 5) there was no statistical difference between results when using a 5-fold or 10-fold cross-validation method. Other issues that impact performance are also enumerated.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Several different independent aspects affect performance when using ML techniques for WSD. We found that combining them into one single result obscures understanding of the underlying methods. Although we studied only four abbreviations, we utilized a well-established statistical method that guarantees the results are likely to be generalizable for abbreviations with similar characteristics. The results of our experiments show that in order to understand the performance of these ML methods it is critical that papers report on the baseline performance, the distribution and sample size of the senses in the datasets, and the standard deviation or confidence intervals. In addition, papers should also characterize the difficulty of the WSD task, the WSD situations addressed and not addressed, as well as the ML methods and features used. This should lead to an improved understanding of the generalizablility and the limitations of the methodology.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="refman"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The use of large-scale experimental and information technologies has dramatically increased the pace of production of biomedical findings, and the number of scientific articles has grown rapidly as well, which makes it impossible for human to retrieve or keep up to date with all the related information from the literature. During the last few years, there has been a surge of interest in information extraction and text mining of the biomedical literature <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. When mining the biomedical literature, a big challenge is the problem of ambiguity inherent in natural language because one textual term may have several different meanings or senses (homonymy). A number of natural language processing systems in the biomedical domain reported decreased precision due to the ambiguity problem <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Weeber <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> found that in order to replicate Swanson's literature-based discovery of the involvement of magnesium deficiency in migraine, it was important to resolve the ambiguity of an abbreviation <it>mg</it>, which can denote either <it>magnesium </it>or <it>milligram</it>.</p>
         <p>WSD is very critical for the biomedical text processing community but also very difficult because of the rapid growth of new words and new senses due to a large increase in discovery of biomedical entities. In 2000, the UMLS Metathesaurus <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, a comprehensive resource that specifies and categorizes biomedical concepts, contained 9,416 ambiguous terms, and in 2004, the number increased to 21,295, an increase of 126% within 4 years <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. More importantly, this figure does not include the many terms associated with gene or gene products, and therefore the amount of ambiguity is likely to be much larger. Studies associated with gene names have shown that the ambiguity problem is complicated because a gene term: 1) may refer to a gene or another type of biomedical term <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, or to a general English word <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>; 2) may be used to denote an RNA, a protein, or a gene <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>; or 3) may be highly ambiguous across multiple species <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. If each ambiguous gene symbol in an article were accompanied by its corresponding long form, the disambiguation task would be much easier. However, Schuemie <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> analyzed 3,902 biomedical full-text articles and found that only 30% of the gene symbols in the abstracts were accompanied by their corresponding full names, and only 18% of the gene symbols in the full text were accompanied by their gene names. Schijvenaars <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> showed that 33% of the human genes in their thesaurus were affected by homonymy. Chen <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> found that 85.1% of mouse genes were ambiguous with other gene names and 233% additional 'gene' instances were retrieved when gene names that were also English words were included when processing the literature.</p>
         <p>To demonstrate the extent of the ambiguity problem in MEDLINE we searched MEDLINE abstracts to determine how many abstracts contained gene symbols that were ambiguous with general English words or biomedical terms. Using data from Entrez Gene<abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, the gene-specific database at the National Center for Biotechnology Information (NCBI), we formed two ambiguous word lists for the mouse organism: a gene-English list (containing mouse gene symbols ambiguous with general English words) and a gene-UMLS list (containing mouse gene symbols ambiguous with biomedical terms from UMLS). Then we searched 82,922 abstracts that are known to be related to mouse genes (based on <it>gene2pubmed </it>file from Entrez Gene, downloaded on 1/2006) to determine the number of abstracts that contained at least one ambiguous word in each of the above two lists respectively, so that we could determine the percent of abstracts that contained a word that was ambiguous with an English word or with a UMLS term respectively. We repeated the same procedure for the fly and yeast organisms as well. Results showed that for the mouse organism alone, 99.7% (82694/82922) of the abstracts were affected by an ambiguity between a gene symbol and a general English word, and 99.8% (82736/82922) were affected by an ambiguity between a gene symbol and a UMLS term. For the fly organism, both numbers were also over 99%, while the number was much less for the yeast organism: 4.6% and 3.1% respectively. To demonstrate that the ambiguity problem is not limited to a small set of words, we systematically removed ambiguous words with a frequency (ratio between the number of abstracts containing the word and the total number of abstracts searched) higher than a threshold and re-calculated the percentage of abstracts that contained the remaining ambiguous words. In order to reduce the percent of abstracts with ambiguity from gene-English and gene-UMLS to a relative low level (7.2% and 13.4% respectively), ambiguous words with frequencies higher than 0.05% would have to be removed, which covered 30.0% (319 out of 1,065 words) and 30.8% (636 out of 2064 words) of all the ambiguous words in the two lists respectively. The same study, which was also performed for the Fly organism, showed similar results, but with slightly higher ambiguity rates. This study shows that the ambiguity among gene symbols, English words and other biomedical terms is extensive and the distribution of ambiguity is very sparse. This study therefore demonstrates that word sense disambiguation is critical for biomedical text mining and retrieval tasks because ambiguous words have a substantial affect on performance. For the details of the ambiguity study, please refer to the sub-section "Gene Ambiguity for mining MEDLINE" in the Methods section.</p>
         <p>Research in automated WSD can be traced back to the 1950s <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. A number of WSD methods have been addressed for the general English domain. More recently, supervised machine learning (ML) technologies have received considerable attention and have shown promising results <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Bruce <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> applied a Bayesian algorithm and chose features based on their "informative" nature. They tested their methods on the <it>interest </it>corpus, which is a corpus consisting of 6 different senses for the word <it>interest</it>, and achieved a precision of 79%. Lee <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> evaluated a variety of knowledge sources (including the parts-of-speech of neighbouring words, single words in the surrounding context, local collocations, and syntactic relations) and supervised learning algorithms (including Support Vector Machines (SVM), Naive Bayes, AdaBoost, and decision tree algorithms) for WSD on the SENSEVAL-1 and SENSEVAL-2 <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> data. Using all of the knowledge sources, the SVM method achieved the highest accuracy rate of 65.4%. Mohammad <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> studied the contribution of lexical features and syntactic features to WSD, and results showed that simple lexical features (words in context and collocation) used in conjunction with part of speech information achieved better results (an accuracy of 66.7% on Senseval-2 set) than other feature combinations.</p>
         <p>Another type of WSD approach uses established knowledge from curated terminology systems <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. In the biomedical domain, Schijvenaars <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> developed a simple thesaurus-based algorithm to disambiguate human gene symbols using training data from PubMed abstracts and annotations from the Online Mendelian Inheritance in Man(OMIM)<abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The system achieved an accuracy rate of 92.7% on an automatically generated testing set. Schijvenaars's study described an effective method for gene disambiguation, but the evaluation results were limited to certain conditions. The automatically generated testing set contained human genes symbols that appeared as long-form and short-form pairs (e.g. prostate specific antigen (PSA)) in articles, where at least 6 articles were determined to be associated with each gene sense. However, in situations where the gene symbol in the paper is ambiguous with a common English word or other type of biomedical word, which is not an abbreviation (i.e. the long form-short form pair is not applicable), the performance of the method is not known: a complete non-abbreviated word may have different characteristics in the text than an abbreviation. For example, this method may not be appropriate for testing a word such as "blind", which is not an abbreviation, but refers to both a gene and a general English word. An additional issue is that this study limited the disambiguation of gene symbols to gene senses and one other category called "non-gene sense", but the actual sense in this category was not resolved. This could be critical for NLP systems accessing phenotypic or disease-related information. An additional limitation of a knowledge-based method is that terms associated with phenotypic senses or general English senses may have little reliable background knowledge available. Therefore, this type of method may not be applicable and ML approaches may be useful. Recently, Humphrey<abbrgrp><abbr bid="B26">26</abbr></abbrgrp> proposed another type of statistical-based method to resolve the ambiguity problem within the UMLS Metathesaurus. They used a Journal Descriptor Indexing (JDI) method, which is ultimately based on statistical associations between words in a training set of MEDLNE citations and a small set of journal descriptors assumed to be inherited by the citations. On a testing set with 45 ambiguous strings from NLM's WSD Test Collection, the overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method.</p>
         <p>Supervised ML methods have also been applied to WSD in the biomedical domain. Hatzivassiloglou<abbrgrp><abbr bid="B10">10</abbr></abbrgrp> developed a disambiguation system to determine the class of a known biomedical named entity by choosing one of three pre-defined senses: gene, RNA, protein. He investigated the contribution of different features: positional information of surrounding words, capitalization information, stop-words and similarly distributed word removal, and stemming, and obtained accuracy rates up to 85% with optimised feature combination. Ginter <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> introduced a new family of classifiers, which were based on an ordering and weighing of the feature vectors obtained from word counts and word co-occurrence in the text. This method was used to determine whether a term was a gene versus a protein and achieved 86% accuracy. Podowski <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> built a two-step classification system to disambiguate gene symbols: the first classifier determined whether the word was a gene versus a non-gene, and the other determined the appropriate gene for a symbol classified as a gene by the first classifier. They reported an F-measure of over 0.7 for genes with sufficient number of known document references. Liu <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> investigated the effect of window size and claimed that biomedical ambiguous words needed a larger window size than general English ambiguous words. In Liu's <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> paper, the gold standard data set was automatically constructed utilizing the fact that authors sometimes define abbreviations when they are first introduced in documents using parenthesized expressions [e.g. <it>Androgen therapy prolongs complete remission in acute myeloblastic leukemia (AML)</it>] and that the same abbreviation had the same sense within a document. The training data set was automatically annotated using unambiguous synonyms, and for some senses, there were limited samples (e.g. <it>PCA </it>with the sense "posterior communicating artery" consisted of only 5 abstracts) for certain datasets. In this study, we used 4 abbreviations from Liu's abbreviation list. However, we used a different method to collect the datasets because we wanted to control the sample sizes of the senses for our experiments. Leroy <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> tried to reduce the training sample size by supplying external knowledge from the UMLS for supervised machine learning algorithms, but the results were not promising. Gaudan <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> developed an algorithm based on use of SVMs to resolve abbreviations in Medline and claimed a precision of 98.9% and a recall of 98.2% on their testing set. In their study, rare senses (senses appearing in less than 40 documents) were excluded from the testing set. This makes the disambiguation task easier because it reduces the problem of sparse senses. In addition, the training set was created based on long-form and short-form pairs, where ambiguous words not having long-forms were not tested. There is a good review of current research of WSD in biomedical domain by Schuemie <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <p>Most of the above papers reporting on the use of ML for WSD follow a similar pattern. A set of ambiguous words is selected, a corpus for each word is collected, and the different senses within the corpus are annotated (automatically or manually). A feature vector is then formed based on the context of the ambiguous word, a supervised machine-learning algorithm is used on a portion of the corpus to train a classifier for the word, and the classifier is tested on the remaining corpus. The main variations are usually in the selection of features and choice of machine-learning algorithms. Experiments are usually performed on a fixed amount of documents (i.e. 1,000 abstracts) per an ambiguous word, where the entire set consists of all the senses, and the sense distribution is generally uneven. Results (usually error rate or accuracy) are reported and an analysis of a few issues is often described, but the results of different experiments are usually not comparable because multiple confounding issues affect them. These papers are important in that they report on useful methods and provide insights and overall results. However, a deeper and more systematic analysis is needed in order to obtain a better understanding of the different factors affecting the performance of ML methods for WSD. In this paper, we discuss a number of issues explicitly and describe some experiments that simulate a variety of situations where different sense distributions, different sample sizes, different levels of difficulties, and different cross validation methods are studied and the effects are quantified. We subsequently based our assessment of performance on error rates and associated standard errors. Although some issues we have addressed in this paper have been mentioned by other papers, our work differs from related work because we focus on a systematic study of issues affecting performance and quantify their effects in order to further understanding of the components of the error rate, which should lead to an improved and more generalizable methodology. Our method also differs from related work because the sample size for each sense is always fixed, whereas in related work the sample size for the entire corpus is generally fixed but not the sample sizes of the senses.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Four ambiguous abbreviations: <it>BPD</it>, <it>BSA</it>, <it>PCA</it>, and <it>RSV</it>, were used in this study. They were chosen because they were associated with varying degrees of differences between their respective senses. For example, two of the senses of <it>PCA </it>studied are very similar whereas two senses of <it>BSA </it>are very different. Table <tblr tid="T1">1</tblr> lists the detailed information about the abbreviations and their senses, and the Methods section explains the differences in more detail. For each abbreviation, we measured error rates of the SVM classifier under different combinations of sample size, sense distribution, cross validation scheme (5-fold vs. 10-fold), and multi-class SVM algorithms (for BPD only, which has 3 different senses). For details of the testing data set and experimental design, please refer to the Methods section.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Information of abbreviation data set</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>Abbreviation</p>
                  </c>
                  <c ca="left">
                     <p>Sense #</p>
                  </c>
                  <c ca="left">
                     <p>Sense</p>
                  </c>
                  <c ca="left">
                     <p># of retrieved articles</p>
                  </c>
                  <c ca="left">
                     <p>Sense Distribution</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>BPD</p>
                  </c>
                  <c ca="left">
                     <p>BPD1</p>
                  </c>
                  <c ca="left">
                     <p>borderline personality disorder</p>
                  </c>
                  <c ca="left">
                     <p>1584</p>
                  </c>
                  <c ca="left">
                     <p>32%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BPD2</p>
                  </c>
                  <c ca="left">
                     <p>bronchopulmonary dysplasia</p>
                  </c>
                  <c ca="left">
                     <p>2335</p>
                  </c>
                  <c ca="left">
                     <p>47%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BPD3</p>
                  </c>
                  <c ca="left">
                     <p>biparietal diameter</p>
                  </c>
                  <c ca="left">
                     <p>1032</p>
                  </c>
                  <c ca="left">
                     <p>21%</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>BSA</p>
                  </c>
                  <c ca="left">
                     <p>BSA1</p>
                  </c>
                  <c ca="left">
                     <p>bovine serum albumin</p>
                  </c>
                  <c ca="left">
                     <p>13352</p>
                  </c>
                  <c ca="left">
                     <p>89%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>BSA2</p>
                  </c>
                  <c ca="left">
                     <p>body surface area</p>
                  </c>
                  <c ca="left">
                     <p>5815</p>
                  </c>
                  <c ca="left">
                     <p>11%</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>PCA</p>
                  </c>
                  <c ca="left">
                     <p>PCA1</p>
                  </c>
                  <c ca="left">
                     <p>posterior cerebral artery</p>
                  </c>
                  <c ca="left">
                     <p>1165</p>
                  </c>
                  <c ca="left">
                     <p>67%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>PCA2</p>
                  </c>
                  <c ca="left">
                     <p>posterior communicating artery</p>
                  </c>
                  <c ca="left">
                     <p>584</p>
                  </c>
                  <c ca="left">
                     <p>33%</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RSV</p>
                  </c>
                  <c ca="left">
                     <p>RSV1</p>
                  </c>
                  <c ca="left">
                     <p>respiratory syncytial virus</p>
                  </c>
                  <c ca="left">
                     <p>5295</p>
                  </c>
                  <c ca="left">
                     <p>60%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>RSV2</p>
                  </c>
                  <c ca="left">
                     <p>rous sarcoma virus</p>
                  </c>
                  <c ca="left">
                     <p>3520</p>
                  </c>
                  <c ca="left">
                     <p>40%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>Tables <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr> and <tblr tid="T4">4</tblr> display the results for <it>BSA</it>, <it>PCA </it>and <it>RSV</it>, each of which has two senses. The distribution shown with bold font in column 1 is the estimated distribution of the senses, which is calculated based on the number of retrieved articles for each sense and the number of retrieved articles for all the senses. Column 2 is the number of total samples from all senses. The range of sample size per sense ranges from 10&#8211;40, with increments of 10 per sense. Average error rates (Err. Rate) and average standard errors (SE) were reported for each combination of distribution and sample size (see Methods section).</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Results for <it>BSA </it>data set. Annotation of the table: Dist: Distribution of senses; S. Size: sample size; Err. Rate: Error Rate; SE: Standard Error of error rates; CV: cross-validation;</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c ca="left">
                     <p>
                        <b>BSA</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>5-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>10-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dist</p>
                  </c>
                  <c ca="center">
                     <p>S. Size</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.5, 0.5)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>21.83%</p>
                  </c>
                  <c ca="center">
                     <p>10.05%</p>
                  </c>
                  <c ca="center">
                     <p>19.67%</p>
                  </c>
                  <c ca="center">
                     <p>9.04%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>11.17%</p>
                  </c>
                  <c ca="center">
                     <p>5.33%</p>
                  </c>
                  <c ca="center">
                     <p>11.08%</p>
                  </c>
                  <c ca="center">
                     <p>5.05%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>5.08%</p>
                  </c>
                  <c ca="center">
                     <p>2.60%</p>
                  </c>
                  <c ca="center">
                     <p>5.04%</p>
                  </c>
                  <c ca="center">
                     <p>2.44%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>3.11%</p>
                  </c>
                  <c ca="center">
                     <p>1.72%</p>
                  </c>
                  <c ca="center">
                     <p>2.61%</p>
                  </c>
                  <c ca="center">
                     <p>1.48%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.6, 0.4)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>23.50%</p>
                  </c>
                  <c ca="center">
                     <p>10.21%</p>
                  </c>
                  <c ca="center">
                     <p>21.00%</p>
                  </c>
                  <c ca="center">
                     <p>9.21%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>12.67%</p>
                  </c>
                  <c ca="center">
                     <p>5.75%</p>
                  </c>
                  <c ca="center">
                     <p>12.08%</p>
                  </c>
                  <c ca="center">
                     <p>5.34%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>5.75%</p>
                  </c>
                  <c ca="center">
                     <p>2.82%</p>
                  </c>
                  <c ca="center">
                     <p>5.00%</p>
                  </c>
                  <c ca="center">
                     <p>2.48%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>3.58%</p>
                  </c>
                  <c ca="center">
                     <p>1.85%</p>
                  </c>
                  <c ca="center">
                     <p>3.28%</p>
                  </c>
                  <c ca="center">
                     <p>1.67%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <b>(0.7, 0.3)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>24.33%</p>
                  </c>
                  <c ca="center">
                     <p>10.59%</p>
                  </c>
                  <c ca="center">
                     <p>23.00%</p>
                  </c>
                  <c ca="center">
                     <p>9.74%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>14.67%</p>
                  </c>
                  <c ca="center">
                     <p>6.11%</p>
                  </c>
                  <c ca="center">
                     <p>12.75%</p>
                  </c>
                  <c ca="center">
                     <p>5.39%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>7.17%</p>
                  </c>
                  <c ca="center">
                     <p>3.16%</p>
                  </c>
                  <c ca="center">
                     <p>6.67%</p>
                  </c>
                  <c ca="center">
                     <p>2.87%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>4.86%</p>
                  </c>
                  <c ca="center">
                     <p>2.17%</p>
                  </c>
                  <c ca="center">
                     <p>4.00%</p>
                  </c>
                  <c ca="center">
                     <p>1.85%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.8, 0.2)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>19.33%</p>
                  </c>
                  <c ca="center">
                     <p>9.82%</p>
                  </c>
                  <c ca="center">
                     <p>19.33%</p>
                  </c>
                  <c ca="center">
                     <p>9.27%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>15.33%</p>
                  </c>
                  <c ca="center">
                     <p>6.31%</p>
                  </c>
                  <c ca="center">
                     <p>14.08%</p>
                  </c>
                  <c ca="center">
                     <p>5.72%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>9.13%</p>
                  </c>
                  <c ca="center">
                     <p>3.58%</p>
                  </c>
                  <c ca="center">
                     <p>8.00%</p>
                  </c>
                  <c ca="center">
                     <p>3.16%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>5.22%</p>
                  </c>
                  <c ca="center">
                     <p>2.23%</p>
                  </c>
                  <c ca="center">
                     <p>4.53%</p>
                  </c>
                  <c ca="center">
                     <p>1.96%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.9, 0.1)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>10.17%</p>
                  </c>
                  <c ca="center">
                     <p>7.55%</p>
                  </c>
                  <c ca="center">
                     <p>10.00%</p>
                  </c>
                  <c ca="center">
                     <p>7.07%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>10.17%</p>
                  </c>
                  <c ca="center">
                     <p>5.33%</p>
                  </c>
                  <c ca="center">
                     <p>10.00%</p>
                  </c>
                  <c ca="center">
                     <p>4.99%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>8.00%</p>
                  </c>
                  <c ca="center">
                     <p>3.38%</p>
                  </c>
                  <c ca="center">
                     <p>7.71%</p>
                  </c>
                  <c ca="center">
                     <p>3.13%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>6.42%</p>
                  </c>
                  <c ca="center">
                     <p>2.48%</p>
                  </c>
                  <c ca="center">
                     <p>6.03%</p>
                  </c>
                  <c ca="center">
                     <p>2.26%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <tbl id="T3">
            <title>
               <p>Table 3</p>
            </title>
            <caption>
               <p>Results for <it>PCA </it>data set. Annotation of the table: Dist: Distribution of senses; S. Size: sample size; Err. Rate: Error Rate; SE: Standard Error of error rates; CV: cross-validation;</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c ca="left">
                     <p>
                        <b>PCA</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>5-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>10-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dist</p>
                  </c>
                  <c ca="center">
                     <p>S. Size</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.5, 0.5)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>43.00%</p>
                  </c>
                  <c ca="center">
                     <p>12.14%</p>
                  </c>
                  <c ca="center">
                     <p>41.00%</p>
                  </c>
                  <c ca="center">
                     <p>11.25%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>34.58%</p>
                  </c>
                  <c ca="center">
                     <p>8.21%</p>
                  </c>
                  <c ca="center">
                     <p>34.33%</p>
                  </c>
                  <c ca="center">
                     <p>7.68%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>37.17%</p>
                  </c>
                  <c ca="center">
                     <p>5.44%</p>
                  </c>
                  <c ca="center">
                     <p>29.46%</p>
                  </c>
                  <c ca="center">
                     <p>5.14%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>28.53%</p>
                  </c>
                  <c ca="center">
                     <p>4.45%</p>
                  </c>
                  <c ca="center">
                     <p>31.47%</p>
                  </c>
                  <c ca="center">
                     <p>4.13%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.6, 0.4)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>37.83%</p>
                  </c>
                  <c ca="center">
                     <p>11.62%</p>
                  </c>
                  <c ca="center">
                     <p>38.50%</p>
                  </c>
                  <c ca="center">
                     <p>11.04%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>36.42%</p>
                  </c>
                  <c ca="center">
                     <p>8.12%</p>
                  </c>
                  <c ca="center">
                     <p>35.92%</p>
                  </c>
                  <c ca="center">
                     <p>7.37%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>25.54%</p>
                  </c>
                  <c ca="center">
                     <p>5.41%</p>
                  </c>
                  <c ca="center">
                     <p>24.88%</p>
                  </c>
                  <c ca="center">
                     <p>5.06%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>28.22%</p>
                  </c>
                  <c ca="center">
                     <p>4.25%</p>
                  </c>
                  <c ca="center">
                     <p>29.25%</p>
                  </c>
                  <c ca="center">
                     <p>3.96%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.7, 0.3)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>33.67%</p>
                  </c>
                  <c ca="center">
                     <p>11.48%</p>
                  </c>
                  <c ca="center">
                     <p>33.50%</p>
                  </c>
                  <c ca="center">
                     <p>10.90%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>33.08%</p>
                  </c>
                  <c ca="center">
                     <p>8.06%</p>
                  </c>
                  <c ca="center">
                     <p>33.08%</p>
                  </c>
                  <c ca="center">
                     <p>7.62%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>29.67%</p>
                  </c>
                  <c ca="center">
                     <p>5.38%</p>
                  </c>
                  <c ca="center">
                     <p>24.29%</p>
                  </c>
                  <c ca="center">
                     <p>4.98%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>26.83%</p>
                  </c>
                  <c ca="center">
                     <p>4.36%</p>
                  </c>
                  <c ca="center">
                     <p>27.83%</p>
                  </c>
                  <c ca="center">
                     <p>4.11%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.8, 0.2)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>23.67%</p>
                  </c>
                  <c ca="center">
                     <p>10.48%</p>
                  </c>
                  <c ca="center">
                     <p>24.50%</p>
                  </c>
                  <c ca="center">
                     <p>9.99%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>21.83%</p>
                  </c>
                  <c ca="center">
                     <p>7.01%</p>
                  </c>
                  <c ca="center">
                     <p>20.58%</p>
                  </c>
                  <c ca="center">
                     <p>6.61%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>28.00%</p>
                  </c>
                  <c ca="center">
                     <p>5.09%</p>
                  </c>
                  <c ca="center">
                     <p>19.25%</p>
                  </c>
                  <c ca="center">
                     <p>4.61%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>22.92%</p>
                  </c>
                  <c ca="center">
                     <p>3.97%</p>
                  </c>
                  <c ca="center">
                     <p>25.03%</p>
                  </c>
                  <c ca="center">
                     <p>3.65%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.9, 0.1)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>12.33%</p>
                  </c>
                  <c ca="center">
                     <p>8.14%</p>
                  </c>
                  <c ca="center">
                     <p>12.00%</p>
                  </c>
                  <c ca="center">
                     <p>7.59%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>10.92%</p>
                  </c>
                  <c ca="center">
                     <p>5.48%</p>
                  </c>
                  <c ca="center">
                     <p>11.08%</p>
                  </c>
                  <c ca="center">
                     <p>5.20%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>14.04%</p>
                  </c>
                  <c ca="center">
                     <p>4.10%</p>
                  </c>
                  <c ca="center">
                     <p>12.50%</p>
                  </c>
                  <c ca="center">
                     <p>3.79%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>10.42%</p>
                  </c>
                  <c ca="center">
                     <p>3.11%</p>
                  </c>
                  <c ca="center">
                     <p>11.14%</p>
                  </c>
                  <c ca="center">
                     <p>2.98%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <b>(0.67,0.33)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>38.33%</p>
                  </c>
                  <c ca="center">
                     <p>11.89%</p>
                  </c>
                  <c ca="center">
                     <p>36.33%</p>
                  </c>
                  <c ca="center">
                     <p>10.96%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>30.17%</p>
                  </c>
                  <c ca="center">
                     <p>7.93%</p>
                  </c>
                  <c ca="center">
                     <p>29.50%</p>
                  </c>
                  <c ca="center">
                     <p>7.46%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>28.25%</p>
                  </c>
                  <c ca="center">
                     <p>5.43%</p>
                  </c>
                  <c ca="center">
                     <p>24.83%</p>
                  </c>
                  <c ca="center">
                     <p>5.00%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>29.47%</p>
                  </c>
                  <c ca="center">
                     <p>4.50%</p>
                  </c>
                  <c ca="center">
                     <p>35.33%</p>
                  </c>
                  <c ca="center">
                     <p>4.15%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <tbl id="T4">
            <title>
               <p>Table 4</p>
            </title>
            <caption>
               <p>Results for <it>RSV </it>data set. Annotation of the table: Dist: Distribution of senses; S. Size: sample size; Err. Rate: Error Rate; SE: Standard Error of error rates; CV: cross-validation;</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c ca="left">
                     <p>
                        <b>BSA</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>5-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>
                        <b>10-fold CV</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dist</p>
                  </c>
                  <c ca="center">
                     <p>S. Size</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
                  <c ca="center">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="center">
                     <p>SE</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.5, 0.5)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>26.50%</p>
                  </c>
                  <c ca="center">
                     <p>10.52%</p>
                  </c>
                  <c ca="center">
                     <p>27.00%</p>
                  </c>
                  <c ca="center">
                     <p>9.72%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>18.83%</p>
                  </c>
                  <c ca="center">
                     <p>6.83%</p>
                  </c>
                  <c ca="center">
                     <p>17.83%</p>
                  </c>
                  <c ca="center">
                     <p>6.29%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>12.79%</p>
                  </c>
                  <c ca="center">
                     <p>4.09%</p>
                  </c>
                  <c ca="center">
                     <p>12.17%</p>
                  </c>
                  <c ca="center">
                     <p>3.78%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>10.58%</p>
                  </c>
                  <c ca="center">
                     <p>3.10%</p>
                  </c>
                  <c ca="center">
                     <p>10.69%</p>
                  </c>
                  <c ca="center">
                     <p>2.93%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <b>(0.6, 0.4)</b>
                     </p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>27.83%</p>
                  </c>
                  <c ca="center">
                     <p>10.78%</p>
                  </c>
                  <c ca="center">
                     <p>27.67%</p>
                  </c>
                  <c ca="center">
                     <p>10.09%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>20.25%</p>
                  </c>
                  <c ca="center">
                     <p>7.00%</p>
                  </c>
                  <c ca="center">
                     <p>19.50%</p>
                  </c>
                  <c ca="center">
                     <p>6.52%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>13.67%</p>
                  </c>
                  <c ca="center">
                     <p>4.25%</p>
                  </c>
                  <c ca="center">
                     <p>12.83%</p>
                  </c>
                  <c ca="center">
                     <p>3.91%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>11.53%</p>
                  </c>
                  <c ca="center">
                     <p>3.20%</p>
                  </c>
                  <c ca="center">
                     <p>10.39%</p>
                  </c>
                  <c ca="center">
                     <p>2.90%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.7, 0.3)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>27.33%</p>
                  </c>
                  <c ca="center">
                     <p>10.84%</p>
                  </c>
                  <c ca="center">
                     <p>26.33%</p>
                  </c>
                  <c ca="center">
                     <p>10.18%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>19.00%</p>
                  </c>
                  <c ca="center">
                     <p>6.81%</p>
                  </c>
                  <c ca="center">
                     <p>17.83%</p>
                  </c>
                  <c ca="center">
                     <p>6.23%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>13.96%</p>
                  </c>
                  <c ca="center">
                     <p>4.27%</p>
                  </c>
                  <c ca="center">
                     <p>13.08%</p>
                  </c>
                  <c ca="center">
                     <p>3.91%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>11.56%</p>
                  </c>
                  <c ca="center">
                     <p>3.23%</p>
                  </c>
                  <c ca="center">
                     <p>10.86%</p>
                  </c>
                  <c ca="center">
                     <p>2.97%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.8, 0.2)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>21.50%</p>
                  </c>
                  <c ca="center">
                     <p>10.20%</p>
                  </c>
                  <c ca="center">
                     <p>19.50%</p>
                  </c>
                  <c ca="center">
                     <p>9.20%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>17.08%</p>
                  </c>
                  <c ca="center">
                     <p>6.60%</p>
                  </c>
                  <c ca="center">
                     <p>16.75%</p>
                  </c>
                  <c ca="center">
                     <p>6.17%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>14.00%</p>
                  </c>
                  <c ca="center">
                     <p>4.29%</p>
                  </c>
                  <c ca="center">
                     <p>13.29%</p>
                  </c>
                  <c ca="center">
                     <p>3.96%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>11.69%</p>
                  </c>
                  <c ca="center">
                     <p>3.26%</p>
                  </c>
                  <c ca="center">
                     <p>10.75%</p>
                  </c>
                  <c ca="center">
                     <p>2.96%</p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.9, 0.1)</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>11.00%</p>
                  </c>
                  <c ca="center">
                     <p>7.77%</p>
                  </c>
                  <c ca="center">
                     <p>10.67%</p>
                  </c>
                  <c ca="center">
                     <p>7.25%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>40</p>
                  </c>
                  <c ca="center">
                     <p>10.58%</p>
                  </c>
                  <c ca="center">
                     <p>5.42%</p>
                  </c>
                  <c ca="center">
                     <p>10.33%</p>
                  </c>
                  <c ca="center">
                     <p>5.05%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>80</p>
                  </c>
                  <c ca="center">
                     <p>9.54%</p>
                  </c>
                  <c ca="center">
                     <p>3.66%</p>
                  </c>
                  <c ca="center">
                     <p>9.33%</p>
                  </c>
                  <c ca="center">
                     <p>3.41%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>120</p>
                  </c>
                  <c ca="center">
                     <p>8.67%</p>
                  </c>
                  <c ca="center">
                     <p>2.86%</p>
                  </c>
                  <c ca="center">
                     <p>8.36%</p>
                  </c>
                  <c ca="center">
                     <p>2.65%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>With a distribution of (0.5, 0.5) and 5-fold cross-validation, the error rate for <it>BSA </it>dropped from 21.83% at sample size 20 to 3.11% at sample size 120. With the same sample size change, the error rate for <it>PCA </it>dropped from 43.00% to only 28.53%. Results for <it>BPD </it>are shown in Table <tblr tid="T5">5</tblr>, which contains the results from three different multi-class SVM algorithms. We used Friedman's test <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> to compare the different multi-class algorithms, and stratified the analysis by probability distribution using sample size (four levels) and multi-class algorithm (three levels) as the two factors in the ANOVA table. The analysis, adjusted appropriately for multiple testing, revealed that only the pair ("one-vs-rest", "one-vs-one") differed and there was no statistically significant difference (at overall level &#945; = 0.1) between "mc-svm" and "one-vs-rest" SVM algorithms. This agrees with work by Rifkin and Klatau <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. A description of the different multi-class algorithms is provided in the Methods section</p>
         <tbl id="T5">
            <title>
               <p>Table 5</p>
            </title>
            <caption>
               <p>Results for <it>BPD </it>data set. Annotation of the table: Dist: Distribution of senses; S. Size: sample size; Err. Rate: Error Rate; SE: Standard Error of error rates; CV: cross-validation;</p>
            </caption>
            <tblbdy cols="14">
               <r>
                  <c ca="left">
                     <p>
                        <b>BPD</b>
                     </p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="6" ca="left">
                     <p>
                        <b>5-fold CV</b>
                     </p>
                  </c>
                  <c cspan="6" ca="left">
                     <p>
                        <b>10-fold CV</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="14">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="2" ca="left">
                     <p>mc-svm</p>
                  </c>
                  <c cspan="2" ca="left">
                     <p>one-vs-rest</p>
                  </c>
                  <c cspan="2" ca="left">
                     <p>one-vs-one</p>
                  </c>
                  <c cspan="2" ca="left">
                     <p>mc-svm</p>
                  </c>
                  <c cspan="2" ca="left">
                     <p>one-vs-rest</p>
                  </c>
                  <c cspan="2" ca="left">
                     <p>one-vs-one</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="12">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dist.</p>
                  </c>
                  <c ca="left">
                     <p>S. size</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
                  <c ca="left">
                     <p>Err. Rate</p>
                  </c>
                  <c ca="left">
                     <p>SE</p>
                  </c>
               </r>
               <r>
                  <c cspan="14">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.33, 0.33, 0.33)</p>
                  </c>
                  <c ca="left">
                     <p>30</p>
                  </c>
                  <c ca="left">
                     <p>26.56%</p>
                  </c>
                  <c ca="left">
                     <p>8.77%</p>
                  </c>
                  <c ca="left">
                     <p>25.78%</p>
                  </c>
                  <c ca="left">
                     <p>8.68%</p>
                  </c>
                  <c ca="left">
                     <p>29.22%</p>
                  </c>
                  <c ca="left">
                     <p>9.06%</p>
                  </c>
                  <c ca="left">
                     <p>23.89%</p>
                  </c>
                  <c ca="left">
                     <p>8.05%</p>
                  </c>
                  <c ca="left">
                     <p>23.44%</p>
                  </c>
                  <c ca="left">
                     <p>7.96%</p>
                  </c>
                  <c ca="left">
                     <p>26.22%</p>
                  </c>
                  <c ca="left">
                     <p>8.31%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>13.89%</p>
                  </c>
                  <c ca="left">
                     <p>4.89%</p>
                  </c>
                  <c ca="left">
                     <p>13.39%</p>
                  </c>
                  <c ca="left">
                     <p>4.83%</p>
                  </c>
                  <c ca="left">
                     <p>15.83%</p>
                  </c>
                  <c ca="left">
                     <p>5.18%</p>
                  </c>
                  <c ca="left">
                     <p>11.89%</p>
                  </c>
                  <c ca="left">
                     <p>4.30%</p>
                  </c>
                  <c ca="left">
                     <p>11.56%</p>
                  </c>
                  <c ca="left">
                     <p>4.23%</p>
                  </c>
                  <c ca="left">
                     <p>13.78%</p>
                  </c>
                  <c ca="left">
                     <p>4.62%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>120</p>
                  </c>
                  <c ca="left">
                     <p>7.67%</p>
                  </c>
                  <c ca="left">
                     <p>2.66%</p>
                  </c>
                  <c ca="left">
                     <p>7.08%</p>
                  </c>
                  <c ca="left">
                     <p>2.55%</p>
                  </c>
                  <c ca="left">
                     <p>8.44%</p>
                  </c>
                  <c ca="left">
                     <p>2.80%</p>
                  </c>
                  <c ca="left">
                     <p>7.00%</p>
                  </c>
                  <c ca="left">
                     <p>2.40%</p>
                  </c>
                  <c ca="left">
                     <p>6.39%</p>
                  </c>
                  <c ca="left">
                     <p>2.31%</p>
                  </c>
                  <c ca="left">
                     <p>8.06%</p>
                  </c>
                  <c ca="left">
                     <p>2.58%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>180</p>
                  </c>
                  <c ca="left">
                     <p>6.06%</p>
                  </c>
                  <c ca="left">
                     <p>1.96%</p>
                  </c>
                  <c ca="left">
                     <p>5.70%</p>
                  </c>
                  <c ca="left">
                     <p>1.90%</p>
                  </c>
                  <c ca="left">
                     <p>6.69%</p>
                  </c>
                  <c ca="left">
                     <p>2.05%</p>
                  </c>
                  <c ca="left">
                     <p>5.70%</p>
                  </c>
                  <c ca="left">
                     <p>1.79%</p>
                  </c>
                  <c ca="left">
                     <p>5.20%</p>
                  </c>
                  <c ca="left">
                     <p>1.72%</p>
                  </c>
                  <c ca="left">
                     <p>6.24%</p>
                  </c>
                  <c ca="left">
                     <p>1.88%</p>
                  </c>
               </r>
               <r>
                  <c cspan="14">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.6, 0.2, 0.2)</p>
                  </c>
                  <c ca="left">
                     <p>30</p>
                  </c>
                  <c ca="left">
                     <p>26.33%</p>
                  </c>
                  <c ca="left">
                     <p>8.91%</p>
                  </c>
                  <c ca="left">
                     <p>25.33%</p>
                  </c>
                  <c ca="left">
                     <p>8.75%</p>
                  </c>
                  <c ca="left">
                     <p>26.89%</p>
                  </c>
                  <c ca="left">
                     <p>8.97%</p>
                  </c>
                  <c ca="left">
                     <p>24.67%</p>
                  </c>
                  <c ca="left">
                     <p>8.21%</p>
                  </c>
                  <c ca="left">
                     <p>23.89%</p>
                  </c>
                  <c ca="left">
                     <p>8.06%</p>
                  </c>
                  <c ca="left">
                     <p>25.44%</p>
                  </c>
                  <c ca="left">
                     <p>8.30%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>16.28%</p>
                  </c>
                  <c ca="left">
                     <p>5.27%</p>
                  </c>
                  <c ca="left">
                     <p>15.56%</p>
                  </c>
                  <c ca="left">
                     <p>5.16%</p>
                  </c>
                  <c ca="left">
                     <p>17.67%</p>
                  </c>
                  <c ca="left">
                     <p>5.44%</p>
                  </c>
                  <c ca="left">
                     <p>15.33%</p>
                  </c>
                  <c ca="left">
                     <p>4.85%</p>
                  </c>
                  <c ca="left">
                     <p>14.00%</p>
                  </c>
                  <c ca="left">
                     <p>4.65%</p>
                  </c>
                  <c ca="left">
                     <p>16.33%</p>
                  </c>
                  <c ca="left">
                     <p>4.97%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>120</p>
                  </c>
                  <c ca="left">
                     <p>10.11%</p>
                  </c>
                  <c ca="left">
                     <p>3.05%</p>
                  </c>
                  <c ca="left">
                     <p>9.22%</p>
                  </c>
                  <c ca="left">
                     <p>2.93%</p>
                  </c>
                  <c ca="left">
                     <p>10.50%</p>
                  </c>
                  <c ca="left">
                     <p>3.10%</p>
                  </c>
                  <c ca="left">
                     <p>9.36%</p>
                  </c>
                  <c ca="left">
                     <p>2.78%</p>
                  </c>
                  <c ca="left">
                     <p>8.50%</p>
                  </c>
                  <c ca="left">
                     <p>2.66%</p>
                  </c>
                  <c ca="left">
                     <p>10.00%</p>
                  </c>
                  <c ca="left">
                     <p>2.87%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>180</p>
                  </c>
                  <c ca="left">
                     <p>7.72%</p>
                  </c>
                  <c ca="left">
                     <p>2.21%</p>
                  </c>
                  <c ca="left">
                     <p>6.89%</p>
                  </c>
                  <c ca="left">
                     <p>2.09%</p>
                  </c>
                  <c ca="left">
                     <p>8.09%</p>
                  </c>
                  <c ca="left">
                     <p>2.26%</p>
                  </c>
                  <c ca="left">
                     <p>6.93%</p>
                  </c>
                  <c ca="left">
                     <p>1.98%</p>
                  </c>
                  <c ca="left">
                     <p>6.37%</p>
                  </c>
                  <c ca="left">
                     <p>1.91%</p>
                  </c>
                  <c ca="left">
                     <p>7.41%</p>
                  </c>
                  <c ca="left">
                     <p>2.04%</p>
                  </c>
               </r>
               <r>
                  <c cspan="14">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(0.8, 0.1, 0.1)</p>
                  </c>
                  <c ca="left">
                     <p>30</p>
                  </c>
                  <c ca="left">
                     <p>18.11%</p>
                  </c>
                  <c ca="left">
                     <p>7.82%</p>
                  </c>
                  <c ca="left">
                     <p>18.11%</p>
                  </c>
                  <c ca="left">
                     <p>7.83%</p>
                  </c>
                  <c ca="left">
                     <p>19.00%</p>
                  </c>
                  <c ca="left">
                     <p>7.99%</p>
                  </c>
                  <c ca="left">
                     <p>18.33%</p>
                  </c>
                  <c ca="left">
                     <p>7.41%</p>
                  </c>
                  <c ca="left">
                     <p>18.22%</p>
                  </c>
                  <c ca="left">
                     <p>7.40%</p>
                  </c>
                  <c ca="left">
                     <p>19.00%</p>
                  </c>
                  <c ca="left">
                     <p>7.53%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>14.78%</p>
                  </c>
                  <c ca="left">
                     <p>5.10%</p>
                  </c>
                  <c ca="left">
                     <p>14.28%</p>
                  </c>
                  <c ca="left">
                     <p>5.03%</p>
                  </c>
                  <c ca="left">
                     <p>15.39%</p>
                  </c>
                  <c ca="left">
                     <p>5.18%</p>
                  </c>
                  <c ca="left">
                     <p>14.67%</p>
                  </c>
                  <c ca="left">
                     <p>4.79%</p>
                  </c>
                  <c ca="left">
                     <p>13.83%</p>
                  </c>
                  <c ca="left">
                     <p>4.67%</p>
                  </c>
                  <c ca="left">
                     <p>14.78%</p>
                  </c>
                  <c ca="left">
                     <p>4.81%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>120</p>
                  </c>
                  <c ca="left">
                     <p>9.31%</p>
                  </c>
                  <c ca="left">
                     <p>2.95%</p>
                  </c>
                  <c ca="left">
                     <p>8.69%</p>
                  </c>
                  <c ca="left">
                     <p>2.85%</p>
                  </c>
                  <c ca="left">
                     <p>9.50%</p>
                  </c>
                  <c ca="left">
                     <p>2.98%</p>
                  </c>
                  <c ca="left">
                     <p>8.56%</p>
                  </c>
                  <c ca="left">
                     <p>2.67%</p>
                  </c>
                  <c ca="left">
                     <p>8.06%</p>
                  </c>
                  <c ca="left">
                     <p>2.59%</p>
                  </c>
                  <c ca="left">
                     <p>8.75%</p>
                  </c>
                  <c ca="left">
                     <p>2.70%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>180</p>
                  </c>
                  <c ca="left">
                     <p>6.87%</p>
                  </c>
                  <c ca="left">
                     <p>2.09%</p>
                  </c>
                  <c ca="left">
                     <p>6.59%</p>
                  </c>
                  <c ca="left">
                     <p>2.05%</p>
                  </c>
                  <c ca="left">
                     <p>7.17%</p>
                  </c>
                  <c ca="left">
                     <p>2.14%</p>
                  </c>
                  <c ca="left">
                     <p>6.35%</p>
                  </c>
                  <c ca="left">
                     <p>1.91%</p>
                  </c>
                  <c ca="left">
                     <p>5.87%</p>
                  </c>
                  <c ca="left">
                     <p>1.84%</p>
                  </c>
                  <c ca="left">
                     <p>6.61%</p>
                  </c>
                  <c ca="left">
                     <p>1.94%</p>
                  </c>
               </r>
               <r>
                  <c cspan="14">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <b>(0.32, 0.47, 0.21)</b>
                     </p>
                  </c>
                  <c ca="left">
                     <p>30</p>
                  </c>
                  <c ca="left">
                     <p>24.22%</p>
                  </c>
                  <c ca="left">
                     <p>8.58%</p>
                  </c>
                  <c ca="left">
                     <p>23.33%</p>
                  </c>
                  <c ca="left">
                     <p>8.44%</p>
                  </c>
                  <c ca="left">
                     <p>26.67%</p>
                  </c>
                  <c ca="left">
                     <p>8.84%</p>
                  </c>
                  <c ca="left">
                     <p>23.00%</p>
                  </c>
                  <c ca="left">
                     <p>7.93%</p>
                  </c>
                  <c ca="left">
                     <p>21.67%</p>
                  </c>
                  <c ca="left">
                     <p>7.71%</p>
                  </c>
                  <c ca="left">
                     <p>25.22%</p>
                  </c>
                  <c ca="left">
                     <p>8.17%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>15.89%</p>
                  </c>
                  <c ca="left">
                     <p>5.21%</p>
                  </c>
                  <c ca="left">
                     <p>14.89%</p>
                  </c>
                  <c ca="left">
                     <p>5.08%</p>
                  </c>
                  <c ca="left">
                     <p>16.83%</p>
                  </c>
                  <c ca="left">
                     <p>5.34%</p>
                  </c>
                  <c ca="left">
                     <p>14.11%</p>
                  </c>
                  <c ca="left">
                     <p>4.66%</p>
                  </c>
                  <c ca="left">
                     <p>13.33%</p>
                  </c>
                  <c ca="left">
                     <p>4.55%</p>
                  </c>
                  <c ca="left">
                     <p>15.39%</p>
                  </c>
                  <c ca="left">
                     <p>4.80%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>120</p>
                  </c>
                  <c ca="left">
                     <p>9.19%</p>
                  </c>
                  <c ca="left">
                     <p>2.91%</p>
                  </c>
                  <c ca="left">
                     <p>7.92%</p>
                  </c>
                  <c ca="left">
                     <p>2.71%</p>
                  </c>
                  <c ca="left">
                     <p>10.25%</p>
                  </c>
                  <c ca="left">
                     <p>3.07%</p>
                  </c>
                  <c ca="left">
                     <p>8.36%</p>
                  </c>
                  <c ca="left">
                     <p>2.64%</p>
                  </c>
                  <c ca="left">
                     <p>7.53%</p>
                  </c>
                  <c ca="left">
                     <p>2.50%</p>
                  </c>
                  <c ca="left">
                     <p>9.50%</p>
                  </c>
                  <c ca="left">
                     <p>2.80%</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>180</p>
                  </c>
                  <c ca="left">
                     <p>6.07%</p>
                  </c>
                  <c ca="left">
                     <p>1.95%</p>
                  </c>
                  <c ca="left">
                     <p>5.48%</p>
                  </c>
                  <c ca="left">
                     <p>1.85%</p>
                  </c>
                  <c ca="left">
                     <p>6.78%</p>
                  </c>
                  <c ca="left">
                     <p>2.07%</p>
                  </c>
                  <c ca="left">
                     <p>5.39%</p>
                  </c>
                  <c ca="left">
                     <p>1.73%</p>
                  </c>
                  <c ca="left">
                     <p>4.61%</p>
                  </c>
                  <c ca="left">
                     <p>1.61%</p>
                  </c>
                  <c ca="left">
                     <p>6.07%</p>
                  </c>
                  <c ca="left">
                     <p>1.85%</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>Figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr> and <figr fid="F3">3</figr> show the error rate versus the sample size for each distribution of the <it>BSA</it>, <it>PCA </it>and <it>RSV </it>data sets with 5-fold cross-validation. As the figures indicate, the reduction of the error rate as a function of the sample size is more dramatic for <it>BSA </it>than for <it>PCA</it>. For <it>BSA </it>there is about a four-fold reduction in the error rate when the sample size increases from 20 to 80 for sense distributions (0.5, 0.5), (0.6, 0.4) and (0.7, 0.3), while there is a two-fold reduction for sense distribution (0.8, 0.2) and no reduction for (0.9, 0.1). In contrast, for <it>RSV</it>, a two-fold reduction of the error rate was observed for distributions (0.5, 0.5), (0.6, 0.4), (0.7, 0.3) and (0.8, 0.2) for an increase in the sample size from 20 to 80. The distribution (0.9, 0.1) behaved the same as <it>BSA</it>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Error Rate versus Sample Size with different sense distributions of <it>BSA </it>data set</p>
            </caption>
            <text>
               <p><b>Error Rate versus Sample Size with different sense distributions of <it>BSA </it>data set</b>. This figure shows the plots of "error rate" versus "sample size" with different sense distributions of <it>BSA </it>data set (case where the 2 ambiguous senses are very different) using 5-fold cross-validation.</p>
            </text>
            <graphic file="1471-2105-7-334-1"/>
         </fig>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Error Rate versus Sample Size with different sense distributions of <it>PCA </it>data set</p>
            </caption>
            <text>
               <p>Error Rate versus Sample Size with different sense distributions of <it>PCA </it>data set. This figure shows the plots of ''error rate'' versus ''sample size'' with different sense distributions of <it>PCA </it>data set (case where the 2 ambiguous senses are very similar) using 5-fold cross validation.</p>
            </text>
            <graphic file="1471-2105-7-334-2"/>
         </fig>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Error Rate versus Sample Size with different sense distributions of <it>RSV </it>data set</p>
            </caption>
            <text>
               <p><b>Error Rate versus Sample Size with different sense distributions of <it>RSV </it>data set</b>. This figure shows the plots of "error rate" versus "sample size" with different sense distributions of <it>RSV </it>data set (case where the 2 ambiguous senses both refer to viruses but the viruses are different types of viruses) using 5-fold cross validation.</p>
            </text>
            <graphic file="1471-2105-7-334-3"/>
         </fig>
         <p>For <it>BSA </it>and <it>RSV </it>there was no significant effect of the sense distributions on the error rates for all different sample sizes, but for <it>PCA </it>the effect of the sense distribution on the error rate was significant. Multiple comparisons, adjusted for multiple testing, indicated that when the overall significance level is 0.1, the sense distributions (0.5, 0.5) and (0.6, 0.4) impact the error rate. These results show that almost balanced sense distributions and rather large training sample sizes reduce the error rate to approximately half of our best guess, which is using the majority sense.</p>
         <p>To address the issue of whether a meaningful reduction in the error rate was achieved by increasing the sample size, we performed further statistical analysis on the results of the <it>BSA </it>and <it>PCA </it>data set. To test the null hypothesis of no differences in the error rates among the different sample sizes (and overall probability distribution) for the <it>BSA </it>and <it>PCA </it>abbreviations, we used Friedman's test. Then we performed sub-analysis using the sign-test (see Methods section for details). The results are summarized as follows and they apply to both 5-fold and 10-fold cross-validation schemes. When the senses are well separated, any increase in the sample size results in a statistically significant decrease of the error rate. This holds for all sense distributions and it is in agreement with the finding that for <it>BSA </it>there was no significant effect of the sense distributions on the error rates for the different sample sizes used. There are, however, differences when the meanings of the senses are not well separated (e.g. <it>PCA</it>). As the Friedman's test indicated, the effect of the sense distribution on the error rate is significant. When the sense distribution is (0.5, 0.5) there are statistically significant differences between the pairs of error rates produced under sample size (20 and 120), the sample sizes (40 and 120) and the sample sizes (80 and 120). The differences in the error rates produced under sample sizes (20 and 40) and (20 and 80) are borderline significant (overall level &#945; = 0.05). When the sense distribution is (0.6, 0.4), an increase in the sample size from 20 to 40 and from 80 to 120 does not produce statistically significant differences in the corresponding error rates. For all other sense distributions, an increase in the sample size did not produce a significant reduction in the error rate &#8211; that is, there are no statistically significant differences between the error rates. We would like to stress here a limitation of the current study. This is the fact that the experiments were carried out only 30 times: this rather small number of replication of the experiments may have contributed to observing borderline significance.</p>
         <p>Figure <figr fid="F4">4</figr> shows plots of the error rate versus sample size for each distribution of the <it>BPD </it>data set based on the 5-fold cross validation using the "one-vs-rest" algorithm. The plots for the four different sense distributions are very similar and actually agree with results obtained indicating that the effect of the different distributions on the error rate is insignificant. Figure <figr fid="F5">5</figr> shows error rate versus sample size plots for three different abbreviations (<it>BSA</it>, <it>RSV</it>, and <it>PCA</it>) at the same distribution (0.5, 0.5). It was presented to show the degree of difficulty among different abbreviations. As expected, the error rate had the following order: <it>BSA </it>&lt;<it>RSV </it>&lt;<it>PCA</it>, which indicated that similar meanings were more difficult to classify. Results from 5-fold cross-validation showed no statistical difference with results from 10-fold cross-validation, which indicated 5-fold cross-validation might be used in evaluation in order to save computational power (for a discussion of the relative merits of 5-fold cross-validation vs. 10-fold cross-validation, see<abbrgrp><abbr bid="B35">35</abbr></abbrgrp>).</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Error Rate versus Sample Size with different sense distributions of <it>BPD </it>data set</p>
            </caption>
            <text>
               <p><b>Error Rate versus Sample Size with different sense distributions of <it>BPD </it>data set</b>. This figure shows the plots of "error rate" versus "sample size" with different sense distributions of <it>BPD </it>data set (where there are 3 ambiguous senses that are different) using 5-fold cross validation and "one-vs-rest" algorithm.</p>
            </text>
            <graphic file="1471-2105-7-334-4"/>
         </fig>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Error Rate versus Sample Size for <it>BSA</it>, <it>RSV </it>and <it>PCA </it>with sense distribution of "(0.5,0.5)"</p>
            </caption>
            <text>
               <p><b>Error Rate versus Sample Size for <it>BSA</it>, <it>RSV </it>and <it>PCA </it>with sense distribution of "(0.5,0.5)"</b>. This figure shows the plots of "error rate" versus "sample size" for <it>BSA</it>, <it>RSV </it>and <it>PCA </it>data sets with fixed distribution of "(0.5, 0.5)" using 5-fold cross validation.</p>
            </text>
            <graphic file="1471-2105-7-334-5"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Issues and our experiments</p>
            </st>
            <p>"Sample size', "sense distribution" and "degree of difficulty" were three of multiple confounding issues that affect the performance of a WSD classifier. Results from our experiments demonstrated that these three factors were intrinsically connected. Notice that as expected, with any distribution, the error rate generally decreased as the sample size increased. However the observed decrease in error rate was more dramatic in the cases where the different senses were well separated. For example, in <it>BSA</it>, the error rate dropped to approximately 5% when the sample size was 80 and the sense distributions were almost balanced, and it was approximately 8% for other distributions with the same size. Notice also the relatively small standard deviations that are associated with those error rates. Moreover, when two senses of a word are very different, then the reduction that is observed in the error rate is meaningful in the sense that it is generally outside the limits of (error rate) &#177; (1 SE) for increases in the sample size from 20 to 40 to 80. In contrast, when the separation between the two senses is poor (i.e. when the senses of an abbreviation are similar to each other), increasing the sample size does not help much, and a very large increase in size is needed for a small reduction in the error rate. In particular, we notice that when the sense distribution (P1, P2) was very unbalanced (i.e. 0.9, 0.1), then the error rate was almost equal to the minority sense proportion. All these findings indicate that the effectiveness of an increase in the sample size is very dependent on the degree of difficulty. When the degree of difficulty is very high, increasing the sample size will not help much unless an extraordinarily large size is used, which would be very costly.</p>
            <p>There are different types of WSD and some are more difficult than others. For example, if two senses are syntactically different, a reliable part of speech tagging method could be effective in resolving the ambiguity. For senses that correspond to the same syntactic category, the similarity of their semantic categories will affect the difficulty of the task (i.e. the <it>bovine serum albumin </it>sense of <it>BSA </it>is substantially different from the <it>body surface area </it>sense). Even for senses within the same semantic class, two close senses will be much more difficult to classify than two unrelated meanings. For example, in <it>RSV</it>, both senses (i.e. <it>respiratory syncytial virus </it>and <it>rous sarcoma virus</it>) are associated with a "virus" concept, but the two concepts are very different types of viruses, and therefore the contexts in which they occur are likely to be different as well. As shown in Figure <figr fid="F5">5</figr>, <it>PCA</it>, which has two very close senses, had much higher error rates than <it>BSA</it>, which has two unrelated senses. Therefore, when comparing the performance of different WSD systems, data sets with the same degree of difficulty should be used. Resnik <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> stated the importance of the semantic similarity of senses and proposed a method to compute performance, which takes similarity of senses into account. Our study is different because it quantified the effect of similarity of senses, and studied the relation between "similarity of senses" and other issues such as "sample size" and "sense distribution". When considering gene symbol disambiguation, we could categorize the tasks as involving four different types of disambiguation: 1) classifying whether a term is a noun or another syntactic part of speech, such as a verb, in which case the term cannot be a gene; 2) classifying whether a term refers to a gene or a non-gene sense (e.g. a general English word or other biomedical term); 3) classifying which gene a term refers to if it is ambiguous with multiple genes or which non-gene sense a term refers if it is ambiguous with multiple non-gene senses; 4) classifying which product (gene, RNA, Protein) a term refers to if it is known to be a particular gene. Podowski's <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> work covered task types 2 and 3, while Hatzivassiloglou's <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> work addressed task type 4. Many evaluations report their results for a set of words, but the difficulty levels and types of disambiguation task types are not stratified.</p>
            <p>To be able to identify whether there are significant differences in the error rates due to different sample sizes and sense distributions while controlling for the abbreviation used, we used Friedman's procedure. Notice that if we stratify by the abbreviation, the mean error rates form a two-way table where the columns correspond to different sample sizes and the rows correspond to different sense distributions. The significance of this methodology is that it provides a comprehensive way to quantify the effects of sample size and sense distribution on the error rate. For <it>BSA</it>, <it>RSV </it>and <it>BPD</it>, we found that the effect of the sense distribution on the error rate was insignificant. For <it>PCA </it>this effect was significant. The effect of different sample sizes on the error rate was significant for <it>BSA</it>, <it>RSV</it>, and <it>BPD</it>. For <it>PCA</it>, although the effect of sample size on the error rate was significant, this effect was observed only when the sample size was increased from 20 to 120, and for fairly balanced sense distributions such as (0.5, 0.5) and (0.6, 0.4). For those two distributions, an increase from 20 to 80 was also significant. Smaller increases in the sample size had an insignificant effect.</p>
            <p>We performed further sub-analysis using non-parametric multiple comparisons to identify the pairs of sample sizes that differ when the abbreviations <it>BSA </it>and <it>RSV </it>were analyzed. This analysis revealed that in the case of <it>BSA </it>the improvements in terms of error rate were statistically significant across distributions as the sample size increased from 20 to 40. For the case of <it>RSV</it>, a much more substantial four-fold increase in the sample size was needed in order to observe an appreciable decrease of the error rate. Effects of "sense distribution" have been addressed in other papers <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B37">37</abbr></abbrgrp> because it is believed that the performance of a WSD classifier may change if the distribution of the different senses is unbalanced. For example, when there is a majority sense for an ambiguous word, the improvement of a WSD classifier is believed to be very small. Results from our study showed there was a difference only when the distribution was very uneven and the task was difficult. For example, for <it>PCA</it>, when the majority sense was over 0.8, the error rate started to decrease and when it was over 0.9, the error rate dramatically decreased so that use of the majority sense was as effective as the ML methods, but with much less cost.</p>
         </sec>
         <sec>
            <st>
               <p>Other confounding issues of WSD</p>
            </st>
            <p>Other issues in addition to sample size, distribution of senses, and difficulty of the task also affect the performance and subsequent assessment of WSD classifiers, as noted below:</p>
            <sec>
               <st>
                  <p>&#8226; Features used</p>
               </st>
               <p>As often discussed in various papers, different features were evaluated to see their contribution to classifier performance <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B20">20</abbr><abbr bid="B29">29</abbr></abbrgrp>. From these papers, there was no single combination of features that seemed to be associated with the best results for any type of WSD task. This could also be due to the existence of other confounding factors in the datasets that were used. In our study, we controlled for this factor by using "bag-of-word" features in all experiments, but it would be interesting to see if the performance improves when different feature vectors are used</p>
            </sec>
            <sec>
               <st>
                  <p>&#8226; ML algorithm</p>
               </st>
               <p>Most papers reported that different ML algorithms did not show much difference on performance <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. But some reported that certain classification algorithms were better than others. For example, Mooney <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> did a comparison study among a na&#239;ve Bayes classifier, perceptron, decision-tree learner, k-nearest-neighbor classifier, logic-based disjunctive normal form, conjunctive normal form and a decision-list learner, and the results showed that the na&#239;ve Bayes and perceptron classifiers performed significantly better than all others. It is still an unclear issue, probably due to the interaction of different combinations of issues. The comparison between different classifiers should be a carefully controlled experiment. The notion that a lower absolute error rate is indicative of the superiority of a classifier is generally flawed because it ignores the possibility that the differences in the different experiments performed are not statistically significant <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Statistical tests <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp> can be used to compare different classifiers.</p>
            </sec>
            <sec>
               <st>
                  <p>&#8226; Baseline reported</p>
               </st>
               <p>It is very important that the baseline of a classification task is reported because it shows how much of an improvement there is using a classifier as compared to the baseline. As shown in our experiments, when there is a majority sense of 0.9 or more, the performance of a WSD classifier may seem high, but that is not due to the classifier. Several papers <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp> realized this issue and reported results for the baseline. More specifically, they excluded samples with a majority sense larger than a threshold because they realized the contribution of the classifier would not be much for those cases.</p>
            </sec>
            <sec>
               <st>
                  <p>&#8226; Results with confidence intervals</p>
               </st>
               <p>When reporting the results (i.e. error rate), not all papers reported confidence intervals (or a similar metric, such as standard deviations). When comparing the performance of WSD classifiers, those metrics are critical because they indicate whether or not an improvement is statistically significant; if there is a large deviation, there may not actually be an improvement even though one error rate is smaller than the other.</p>
            </sec>
            <sec>
               <st>
                  <p>&#8226; Feasibility</p>
               </st>
               <p>One of the problems of supervised machine learning for WSD is the need for an annotated training (and testing) data set for each ambiguous word, which may require a huge effort. There are two approaches that address this problem: 1) designing an efficient sampling method to lower the cost of manual sense tagging <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, or 2) use of an automated method to generate sense-tagged data <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B42">42</abbr></abbrgrp>, but this may not always be possible or may inadvertently introduce bias. In our study, we proposed a simple "full-term substitution" method, which is described in more detail in the Methods section, to automatically generate training data, but this is only applicable for abbreviations.</p>
               <p>In this study, we used a "full-form substitution" method to automatically generate the data set for the experiments, which is an artificial training set. We compared the estimated sense distribution from our method with that of Liu's method <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and found they were similar for most of the abbreviations (e.g. <it>RSV, BPD, BSA</it>), and that the majority senses based on use of each method were the same. We did not compare the substitution method with other methods for WSD. In addition, we used an SVM classifier for all the experiments. Since the goals of our study did not include the comparison of different algorithms, we do not present related results here. Other studies showed that different ML algorithms had similar performance for WSD tasks <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Thus, it is likely that our findings are applicable to other ML methods because similar issues have been discussed in the general ML literature <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
               <p>Earlier studies have investigated a number of the issues discussed here in the context of constructing better classifiers. A discussion of some of the issues involved can be found in <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Here, we examined these issues in the context of word sense disambiguation. The methodology we used to quantify the impact of various factors on the error rate, and hence on the performance of the WSD classifier, is a well-known, theory-based, statistical methodology. The methodology is easy to apply, it provides a principled way of studying the effects of the different factors on the error rate, and since it is based on a strong theoretical foundation, it guarantees that the results to apply to all abbreviations with similar characteristics. Therefore, although we studied only four abbreviations, the results concerning sample size, sense similarity, and distributions are likely to be generalizable for abbreviations with similar characteristics. The results presented here agree with general results presented in the literature on the performance of classifiers <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Future work</p>
            </st>
            <p>To further analyze the effects of "sample size", "sense distribution" and "degree of difficulty" on the error rate, an error decomposition model will be explored. Methods to measure the degree of distances among different senses are also being studied.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper, we aimed to further an understanding of the different factors affecting the performance of ML techniques for WSD by systematically simulating a variety of situations where different sample size, sense distribution, degree of difficulty, and cross validation methods were used. We evaluated the performance of SVM classifiers for those situations. Results from our experiments showed that: 1) increasing the sample size generally reduced the classifier error rate, but this was limited mainly to well-separated senses (such as senses with different semantic types or senses with the same semantic types but unrelated meanings); in difficult cases an unusually large increase in sample size was needed to increase performance slightly, which was costly and impractical, 2) the sense distribution did not have much effect on classifier performance for cases where the senses were separable, 3) when there was a majority sense of over 90%, choosing the majority sense seemed to be the most effective strategy because the cost was low as was the error rate, 4) the error rate was proportional to the similarity of senses, and 5) there was no statistical difference between results using 5-fold or 10-fold cross-validation. In this paper, we also demonstrated that ambiguity of biomedical entities is a significant problem, which has a substantial impact on text mining and retrieval tasks in the biomedical domain.</p>
         <p>ML methods are still needed for WSD, which is critical for increasing the accuracy of biomedical natural language, text mining, and information retrieval systems. ML methods are especially important for those cases that cannot readily be addressed using knowledge-based methods. Therefore it is important that we understand the different elements affecting their performance. In order to improve our understanding of the ML methods, it is critical that in addition to reporting on overall results, papers also report on the baseline performance, the distribution of senses in the datasets, the standard deviation or confidence intervals, the types of ambiguity addressed, and the difficulty of the task as well as the methods and features used.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>After manually reviewing a set of WSD papers in the biomedical domain, different issues associated with performance were enumerated. For an initial study, we conducted experiments to evaluate the effect of three confounding issues: "sample size", "sense distribution" and "degree of difficulty", and we used an automatically generated data set. A discussion of the results and issues can be found in the Results and Discussion sections.</p>
         <sec>
            <st>
               <p>Data set for experiments</p>
            </st>
            <p>Four abbreviations were used in the experiments. Table <tblr tid="T1">1</tblr> lists the detailed information about the abbreviations and their senses. These abbreviations were originally specified in the ABBR data set <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. We chose them by considering the different levels of semantic similarity among their senses. <it>BSA </it>denotes two senses that have very different meanings, but <it>PCA </it>denotes two senses that have very similar meanings; the two senses of <it>RSV </it>are both associated with a virus, but the viruses are very different types; finally, <it>BPD </it>denotes three very different senses. The original data set for <it>PCA </it>contained 6 different senses, but we only used the two that were very similar for our experiments. We used a simple "full-form substitution" method to automatically generate a data set for the experiments described in this paper, and this dataset was partitioned into training and testing sets. To perform the "full-form substitution" for each sense of an abbreviation, PubMed articles published before October 2005 were searched using an exact string match for the full-form of the sense. The full-form in the title or abstract of the article was then replaced with the ambiguous abbreviation, and the appropriate sense was noted separately. Table <tblr tid="T1">1</tblr> shows the number of articles that were obtained for the different abbreviations and senses. The estimated sense distribution was calculated from the number of retrieved articles and displayed in the last column. For each sense, we recorded all the retrieved PMIDs, randomly selected 250, and then obtained the corresponding abstracts to form a data pool, from which all the experiments were drawn.</p>
         </sec>
         <sec>
            <st>
               <p>Feature vector and machine-learning algorithm</p>
            </st>
            <p>For all the experiments in this paper, we used the simple "bag-of-word" method to construct the feature vector. All the words in the title and abstract of the articles were used as features for machine learning and an SVM algorithm was used to generate a classifier. We used a package called "Spider" <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> to perform all the SVM training and testing. For abbreviations with only two senses (<it>BSA, PCA, RSV</it>), a binary SVM classifier was used. For <it>BPD</it>, which has three different senses, three different multi-class SVM methods <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>: "mc-svm", "one-vs-rest", "one-vs-one", were used. "Mc-svm" implements the algorithm with a decision function which considers all classes at once, while "one-vs-rest" and "one-ve-one" are constructed by combining several binary SVM classifiers. "One-vs-rest", also known as "one-against-all", constructs N binary SVM classifiers for a classification task with N classes. The <it>i</it>th binary SVM classifier is trained by considering all instances associated with the <it>i</it>th class as positive examples and the others as negative instances. It applies the N classifiers and chooses the one with the highest confidence. "One-vs-one", also known as "one-against-one", constructs N(N-1)/2 binary SVM classifiers where each is trained with data from two classes: one as positive and one as negative. It applies these N(N-1)/2 SVM classifiers and the class assignment is determined by a voting strategy (e.g. the class chose by the maximum number of SVM classifiers wins). The performance was measured using both a 5-fold and a 10-fold cross-validation method.</p>
         </sec>
         <sec>
            <st>
               <p>Experiments</p>
            </st>
            <p>For abbreviations with two senses (<it>BSA, PCA, RSV</it>), we simulated 5 different combinations of sense distribution, which were (0.5, 0.5), (0.6, 0.4), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1), and also used an additional combination, which was the estimated distribution of the senses. For example, a sample testing set with size 20 and sense distribution (0.5, 0.5) means 10 samples in the set are with one sense and the other 10 samples are with the other sense. The estimated sense distribution is listed in the last column of Table <tblr tid="T1">1</tblr>, which is calculated based on the number of retrieved articles for each sense and the number of retrieved articles for all the senses. For <it>BSA </it>and <it>RSV</it>, the estimated distributions were the same as one of the 5 simulated distributions, and therefore the experiments used only 5 combinations for those two. For PCA, the estimated distribution was (0.67, 0.33). Four different sample sizes were used (20, 40, 80 and 120), and for each, a proportional sample for each sense was obtained based on the particular distribution. For <it>BPD</it>, which has 3 senses, 4 distribution patterns were used: (0.3,0.33,0.33), (0.6, 0.2, 0.2), (0.8, 0.1, 0.1) and (0.32, 0.47, 0.21), where the last one was the estimated distribution. For each distribution pattern, 4 different total sample sizes were used: 30, 60, 120 and 180. Error rates for each combination of sense distribution and sample size were averaged using 30 runs.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical methodology</p>
            </st>
            <p>To quantify the effects of sample size, sense distribution and difficulty of the task on the error rate, appropriate statistical methods were used. Friedman's test is the non-parametric analogue of a two-way analysis of variance (ANOVA) table. No assumptions are made about the original distribution (e.g. normal vs. other) of the documents. Analysis of variance models are versatile statistical tools for studying the relation between error rates and sense distribution, sample size, and degree of difficulty of a task. These models do not require making assumptions about the nature of the statistical relation, nor do they require that sense distribution, sample size or degree of difficulty to be quantitative variables.</p>
            <p>To understand the effects of increased sample size on the error rate, we stratified by the sense distribution and then tested the null hypothesis of no difference between the error rates obtained under the different sample sizes using the sign test. The sign test is a non-parametric test that does not impose any distributional assumptions, such as normality, on the data. It is useful for testing whether one random variable in a pair tends to have larger (smaller or simply different) values than the other random variable in the pair. In our case, the random variables in the pair are the error rates obtained under the different sample sizes used. For each abbreviation, each sense distribution and each cross-validation scheme we have 6 pairs of random variables corresponding to different combinations of the sample size. For each combination of error rates we have a sample of 30 observations. To exemplify, assume the pair consisted of the error rates obtained under sample size 20 and 40. Then the set of observations was comprised of those error rates obtained from the 30 simulation runs. The null hypothesis would be that the median error rate when the sample size is 20 equals the median error rate when the sample size is 40. Because, for each sense distribution, we had 6 such comparisons to make we adjusted for multiple testing by setting the overall significance level &#945; = 0.05 and then divided this by 6 to obtain individual level of 0.0084 (Bonferroni Adjustment).</p>
            <p>We computed the standard deviation of the error rate as follows. Recall that for each abbreviation, each sense distribution and each sample size we run the experiment 30 times. Let <it>p</it>(<it>i</it>) denote the error rate for the <it>i</it>th data set, <it>i </it>= 1,2,...,30. The error rate was computed using both a 5-fold and a 10-fold cross-validation scheme. Let the size of the training set be denoted by <it>n</it>. For example, when the total sample size is 20 and 5-fold cross validation is used the size of the training set is 16, while if the sample size is 80 the size of the training set is 64. For each of the 30 runs we estimated the standard error using the formula:<m:math name="1471-2105-7-334-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msqrt><m:mrow><m:mi>p</m:mi><m:mrow><m:mo>(</m:mo><m:mi>i</m:mi><m:mo>)</m:mo></m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mn>1</m:mn><m:mo>&#8722;</m:mo><m:mi>p</m:mi><m:mrow><m:mo>(</m:mo><m:mi>i</m:mi><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>/</m:mo><m:mi>n</m:mi></m:mrow></m:msqrt></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabdchaWnaabmGabaGaemyAaKgacaGLOaGaayzkaaWaaeWaceaacqaIXaqmcqGHsislcqWGWbaCdaqadiqaaiabdMgaPbGaayjkaiaawMcaaaGaayjkaiaawMcaaiabc+caViabd6gaUbWcbeaaaaa@3B18@</m:annotation></m:semantics></m:math>. The estimate of the standard error was then obtained by averaging the above values over the 30 runs.</p>
         </sec>
         <sec>
            <st>
               <p>Gene ambiguity for mining MEDLINE</p>
            </st>
            <p>To determine the extent of the gene ambiguity problem in MEDLINE, we searched MEDLINE abstracts to determine how many abstracts contained gene symbols that were ambiguous with general English words or biomedical terms. We formed a mouse gene symbol list by retrieving all gene symbol/name/synonyms from Entrez Gene<abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, the gene-specific database at the NCBI, for the mouse species. Then we compared this gene symbol list with a general English word list (Webster's 2<sup>nd </sup>international dictionary) and with the UMLS term list (from UMLS Metathesaurus 2005AA, removing all bio-molecular entities with semantic types "Gene or Genome", "Biologically Active Substance", "Amino Acid", "Peptide or Protein", "Enzyme", "Immunologic Factor and Receptor", please see<abbrgrp><abbr bid="B11">11</abbr></abbrgrp> for details) via case-insensitive exact string match. Two ambiguous gene symbol lists were formed as a result of the comparisons: a gene-English list (containing gene symbols ambiguous with general English words) and a gene-UMLS list (containing gene symbols ambiguous with biomedical terms). We also formed a pool of MEDLINE abstracts by collecting all abstracts that were related to mouse genes using <it>gene2pubmed </it>file from Entrez Gene (downloaded on 1/2006), which led to 82, 922 abstracts in the pool. We performed a case-insensitive search on each abstract in the pool to determine the number of abstracts that contained at least one word in each of the above two lists respectively, so that we could determine the percent of abstracts that contained a word that was ambiguous with an English word or with a UMLS term respectively. However there is a concern that a very limited set of words may have accounted for the vast majority of ambiguity. Therefore, for each ambiguous word, we calculated its frequency, which is defined as the ratio between the number of abstracts containing the word and the total number of abstracts in the pool. For example, the word "brown" occurred in 399 abstracts therefore had a frequency of 399/82922 = 0.48%. For each threshold, we removed ambiguous words with frequencies higher than that threshold and re-calculated the percentage of abstracts that contained the remaining ambiguous words. Meanwhile, we also recorded the percentage of ambiguous words that were removed from the ambiguous word-list for different thresholds. We removed words with frequencies higher than 10%, 1%, 0.1% and 0.05% from the two lists of the mouse organism. Results showed that the percentages of abstracts containing the remaining ambiguous words were 80.9%, 46.2%, 13.5% and 7.2% respectively for gene-English ambiguity, and 89.8%, 68.6%, 24.0% and 13.4% respectively for gene-UMLS ambiguity. The percentages of ambiguous words that were removed from the list for different thresholds(10%, 1%, 0.1%, 0.05%) were 0.8%(8/1065), 4.8%(51/1065), 20.3%(216/1065), 30%(319/1065) for gene-English ambiguity and 1.0%(20/2064), 3.8%(79/2064), 21.2%(437/2064) and 30.8%(636/2064) for gene-UMLS ambiguity. The same study, which was also performaned for the Fly organism, showed similar results, but with slightly higher ambiguity rates. For a more complete description of this study and the results, please [see <supplr sid="S1">additional file 1</supplr>].</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>Supplementary material for gene ambiguity for mining MEDLINE</p>
               </text>
               <file name="1471-2105-7-334-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>HX carried out data collection, programming, experiments using SVM and drafted the manuscript. RD participated in the statistical analysis of the results. MM and CF conceived of the study, and participated in its design and coordination and helped to draft the manuscript. MM also performed statistical analysis and interpreted the results. HL advised in the design of study. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by part by Grants R01 LM7659, R01 LM8635 from the National Library of Medicine, and Grants NSF-DMS-0504957, NSF- IIS-0430743 from the National Science Foundation. We would like to thank Lyudmila Shagina for providing technical support.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Text-mining and information-retrieval services for molecular biology</p>
            </title>
            <aug>
               <au>
                  <snm>Krallinger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>224</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1175978</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998455</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-7-224</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Mining the biomedical literature in the genomic era: an overview</p>
            </title>
            <aug>
               <au>
                  <snm>Shatkay</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>821</fpage>
            <lpage>855</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652703322756104</pubid>
                  <pubid idtype="pmpid" link="fulltext">14980013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Toward information extraction: identifying protein names from biological papers</p>
            </title>
            <aug>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tsunoda</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Takagi</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>1998</pubdate>
            <volume>707&#8211;18</volume>
            <fpage>707</fpage>
            <lpage>718</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program</p>
            </title>
            <aug>
               <au>
                  <snm>Aronson</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Proc AMIA Symp</source>
            <pubdate>2001</pubdate>
            <volume>17&#8211;21</volume>
            <fpage>17</fpage>
            <lpage>21</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Text-based discovery in biomedicine: the architecture of the DAD-system</p>
            </title>
            <aug>
               <au>
                  <snm>Weeber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Aronson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Mork</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>de Jong-van den Berg</snm>
                  <fnm>LT</fnm>
               </au>
               <au>
                  <snm>Vos</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proc AMIA Symp</source>
            <pubdate>2000</pubdate>
            <volume>903&#8211;7</volume>
            <fpage>903</fpage>
            <lpage>907</lpage>
         </bibl>
         <bibl id="B6">
            <aug>
               <au>
                  <cnm>NLM</cnm>
               </au>
            </aug>
            <source>UMLS Knowledge Sources</source>
            <edition>11</edition>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B7">
            <aug>
               <au>
                  <snm>Aronson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Shooshan</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Ambiguity of UMLS metathesaurus 2004 Edition</source>
            <pubdate>2004</pubdate>
            <url>http://skr.nlm.nih.gov/papers/references/ambiguity04.pdf</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Am Med Inform Assoc</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>621</fpage>
            <lpage>636</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">349379</pubid>
                  <pubid idtype="pmpid" link="fulltext">12386113</pubid>
                  <pubid idtype="doi">10.1197/jamia.M1101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Gene terms and English words: An ambiguous mix</p>
            </title>
            <aug>
               <au>
                  <snm>Sehgal</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bodenreider</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>SIGIR'04 Workshop on Search and Discovery in BioInformatics</source>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Disambiguating proteins, genes, and RNA in text: a machine learning approach</p>
            </title>
            <aug>
               <au>
                  <snm>Hatzivassiloglou</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Duboue</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Rzhetsky</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S97</fpage>
            <lpage>106</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11472998</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Gene name ambiguity of eukaryotic nomenclatures</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>248</fpage>
            <lpage>256</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth496</pubid>
                  <pubid idtype="pmpid" link="fulltext">15333458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Distribution of information in biomedical abstracts and full-text publications</p>
            </title>
            <aug>
               <au>
                  <snm>Schuemie</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Weeber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schijvenaars</snm>
                  <fnm>BJA</fnm>
               </au>
               <au>
                  <snm>van Mulligen</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>van der Eijk</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Jelier</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2597</fpage>
            <lpage>2604</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth291</pubid>
                  <pubid idtype="pmpid" link="fulltext">15130936</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Thesaurus-based disambiguation of gene symbols</p>
            </title>
            <aug>
               <au>
                  <snm>Schijvenaars</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Mons</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Weeber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schuemie</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>van Mulligen</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Wain</snm>
                  <fnm>HM</fnm>
               </au>
               <etal/>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>149</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1183190</pubid>
                  <pubid idtype="pmpid" link="fulltext">15958172</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Entrez Gene: Gene-centered information at NCBI</p>
            </title>
            <aug>
               <au>
                  <snm>Maglott</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ostell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>D54</fpage>
            <lpage>D58</lpage>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Syntax and the problem of multiple meaning</p>
            </title>
            <aug>
               <au>
                  <snm>Yngve</snm>
                  <fnm>VH</fnm>
               </au>
            </aug>
            <source>Machine Translation of Languages</source>
            <publisher>New York, John Wiley &amp; Sons</publisher>
            <pubdate>1955</pubdate>
            <fpage>208</fpage>
            <lpage>226</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning</p>
            </title>
            <aug>
               <au>
                  <snm>Mooney</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Proc 1996 Conf on Empirical Methods in Natural Language Processing</source>
            <fpage>82</fpage>
            <lpage>91</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>HB</fnm>
               </au>
            </aug>
            <source>Proc 34th Ann Meeting Assoc for Comput Ling</source>
            <fpage>40</fpage>
            <lpage>47</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Combination of contextual features for word sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Merkel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>SENSEVAL-2 Workshop</source>
            <fpage>123</fpage>
            <lpage>127</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Word sense disambiguation using decomposable models</p>
            </title>
            <aug>
               <au>
                  <snm>Bruce</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wiebe</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of the Thirty-second Annual Meeting of the Association of Computational Linguistics</source>
            <fpage>139</fpage>
            <lpage>146</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>YK</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>HT</fnm>
               </au>
            </aug>
            <source>Proc EMNLP</source>
            <pubdate>2002</pubdate>
            <fpage>41</fpage>
            <lpage>48</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>SENSEVAL-2</p>
            </title>
            <aug>
               <au>
                  <snm>Cotton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Edmonds</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kilgarriff</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Palmer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <pubdate>1998</pubdate>
            <url>http://www.sle.sharp.co.uk/senseval2/</url>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Combining lexical and syntactic features for supervised word sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Mohammad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc of the CoNLL</source>
         </bibl>
         <bibl id="B23">
            <aug>
               <au>
                  <snm>Wilks</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Fass</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Plate</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Slator</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Providing Machine Tractable Dictionary Tools</source>
            <publisher>Cambridge, MA: MIT Press</publisher>
            <pubdate>1990</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Statistically-guided word sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Liddy</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Paik</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>AAAI Fall Symp 93</source>
            <fpage>98</fpage>
            <lpage>107</lpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders</p>
            </title>
            <aug>
               <au>
                  <snm>Hamosh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Amberger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bocchini</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valle</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McKusick</snm>
                  <fnm>VA</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>52</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99152</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752252</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.52</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment</p>
            </title>
            <aug>
               <au>
                  <snm>Humphrey</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Kilicoglu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Demner-Fushman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rindflesch</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Journal of the American Society for Information Science and Technology</source>
            <pubdate>2006</pubdate>
            <volume>57</volume>
            <fpage>96</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/asi.20257</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>New Techniques for Disambiguation in Natural Language and Their Application to Biological Text</p>
            </title>
            <aug>
               <au>
                  <snm>Ginter</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Boberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Salakoski</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Salakoski</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>605</fpage>
            <lpage>621</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>AZuRE, a scalable system for automated term disambiguation of gene and protein names</p>
            </title>
            <aug>
               <au>
                  <snm>Podowski</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Cleary</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Goncharoff</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Amoutzias</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>WS</fnm>
               </au>
            </aug>
            <source>Proc 2004 IEEE CSB</source>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A multi-aspect comparison study of supervised word sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Teller</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Am Med Inform Assoc</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>320</fpage>
            <lpage>331</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">436083</pubid>
                  <pubid idtype="pmpid" link="fulltext">15064284</pubid>
                  <pubid idtype="doi">10.1197/jamia.M1533</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Effects of information and machine learning algorithms on word sense disambiguation with small datasets</p>
            </title>
            <aug>
               <au>
                  <snm>Leroy</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rindflesch</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Int J Med Inform</source>
            <pubdate>2005</pubdate>
            <volume>74</volume>
            <fpage>573</fpage>
            <lpage>585</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ijmedinf.2005.03.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">15897005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Resolving abbreviations to their senses in Medline</p>
            </title>
            <aug>
               <au>
                  <snm>Gaudan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Rebholz-Schuhmann</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3658</fpage>
            <lpage>3664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti586</pubid>
                  <pubid idtype="pmpid" link="fulltext">16037121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Word sense disambiguation in the biomedical domain: an overview</p>
            </title>
            <aug>
               <au>
                  <snm>Schuemie</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Kors</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Mons</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <fpage>554</fpage>
            <lpage>565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/cmb.2005.12.554</pubid>
                  <pubid idtype="pmpid" link="fulltext">15952878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The use of ranks to avoid the assumption of normality implicit in the analysis of variance</p>
            </title>
            <aug>
               <au>
                  <snm>Friedman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1937</pubdate>
            <volume>32</volume>
            <fpage>675</fpage>
            <lpage>701</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2279372</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>In defense of one-vs-all classification</p>
            </title>
            <aug>
               <au>
                  <snm>Rifkin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Klatau</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>141</fpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Analysis of variance of cross-validation estimators of the generalization error</p>
            </title>
            <aug>
               <au>
                  <snm>Markatou</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Biswas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hripcsak</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>1127</fpage>
            <lpage>1168</lpage>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Distinguishing systems and distinguishing senses: New evaluation tools for words sense disambiguation</p>
            </title>
            <aug>
               <au>
                  <snm>Resnik</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yarowsky</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Natural Lang Eng</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>113</fpage>
            <lpage>133</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1017/S1351324999002211</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Distinguishing word senses in untagged text</p>
            </title>
            <aug>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bruce</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Second Conference on Empirical Methods in Natural Language Processing</source>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A comparison of methods for multi-class support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Hsu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Neural Networks</source>
            <pubdate>2006</pubdate>
            <volume>13</volume>
            <fpage>415</fpage>
            <lpage>425</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/72.991427</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Approximate statistical tests for comparing supervised classification learning algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Dietterich</snm>
                  <fnm>TG</fnm>
               </au>
            </aug>
            <source>Neural Computation</source>
            <pubdate>1998</pubdate>
            <volume>10</volume>
            <fpage>1895</fpage>
            <lpage>1924</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1162/089976698300017197</pubid>
                  <pubid idtype="pmpid">9744903</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>On comparing classifiers: Pitfalls to avoid and a recommended approach</p>
            </title>
            <aug>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Data Mining and Knowledge Discovery</source>
            <pubdate>1997</pubdate>
            <volume>1</volume>
            <fpage>317</fpage>
            <lpage>328</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1023/A:1009752403260</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Minimizing manual annotation cost in supervised training from corpora</p>
            </title>
            <aug>
               <au>
                  <snm>Engelson</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Dagan</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>34th Annual Meeting of Association for Computational Linguistics</source>
            <fpage>319</fpage>
            <lpage>326</lpage>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Automatic extraction of acronym-meaning pairs from MEDLINE databases</p>
            </title>
            <aug>
               <au>
                  <snm>Pustejovsky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Castano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cochran</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kotecki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Morrell</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Medinfo</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <fpage>371</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11604766</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <aug>
               <au>
                  <snm>Hand</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Construction and assessment of classification rules</source>
            <publisher>Chichester, England: John Wiley &amp; Sons</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Assessing Classification Rules</p>
            </title>
            <aug>
               <au>
                  <snm>Hand</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Journal of Applied Statistics</source>
            <pubdate>1994</pubdate>
            <volume>21</volume>
            <fpage>3</fpage>
            <lpage>16</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Effect of sample size in classifier design</p>
            </title>
            <aug>
               <au>
                  <snm>Fukunaga</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>IEEE Transactions in Pattern Analysis and MachineIntelligence</source>
            <pubdate>1989</pubdate>
            <volume>11</volume>
            <fpage>873</fpage>
            <lpage>885</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/34.31448</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Spider-MachineLearning Package</p>
            </title>
            <aug>
               <au>
                  <snm>Weston</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elisseeff</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>BakIr</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sinz</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <pubdate>2005</pubdate>
            <url>http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Multiclass support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Weston</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Watkins</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proceedings of ESANN99</source>
         </bibl>
      </refgrp>
   </bm>
</art>
