
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd">
    
    <titleInfo>
        <title>Evaluating Speech Separation Systems</title>
    </titleInfo>
    <name type="personal" ID="de171">
        <namePart type="family">Ellis</namePart>
        <namePart type="given">Daniel P. W.</namePart>
        <role>
            <roleTerm type="text">author</roleTerm>
        </role>
        <affiliation>Columbia University. Electrical Engineering</affiliation>
    </name>
    <name type="corporate">
        <namePart>Columbia University. Electrical Engineering</namePart>
        <role>
            <roleTerm type="text">originator</roleTerm>
        </role>
    </name>
    <typeOfResource>text</typeOfResource>
    <genre>Book chapters</genre>
    
    <originInfo>
        <dateIssued encoding="w3cdtf" keyDate="yes">2004</dateIssued>
        <edition>manuscript version</edition>
    </originInfo>
    
    <language>
        <languageTerm type="text">English</languageTerm>
    </language>
    <abstract>Common evaluation standards are critical to making progress in any field, but they can also distort research by shifting all the attention to a limited subset of the problem. Here, we consider the problem of evaluating algorithms for speech separation and acoustic scene analysis, noting some weaknesses of existing measures, and making some suggestions for future evaluations. We take the position that the most relevant &apos;ground truth&apos; for sound mixture organization is the set of sources perceived by human listeners, and that best evaluation standards would measure the machine&apos;s match to this perception at a level abstracted away from the low-level signal features most often considered in signal processing.</abstract>
    <subject>
        <topic>Communication</topic>
    </subject>
    <subject>
        <topic>Physiological psychology</topic>
    </subject>
    <relatedItem type="host">
        <titleInfo>
            <title>Speech Separation by Humans and Machines</title>
        </titleInfo>
        <name type="personal">
            <namePart type="family">Divenyi</namePart>
            <namePart type="given">Pierre</namePart>
            <role>
                <roleTerm type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <place>
               <placeTerm type="text">New York</placeTerm>
            </place>
            <publisher>Kluwer</publisher>
            <dateIssued encoding="w3cdtf">2005</dateIssued>
        </originInfo>
        <part>
            <extent unit="page">
                <start>295</start>
                <end>304</end>
            </extent>
        </part>
        <identifier type="doi">10.1007/0-387-22794-6_20</identifier>
    </relatedItem>
    <identifier type="hdl">http://hdl.handle.net/10022/AC:P:12564</identifier>
    
    <location>
        <physicalLocation authority="marcorg">NNC</physicalLocation>
    </location>
    
    <recordInfo>
        <recordContentSource authority="marcorg">NNC</recordContentSource>
        <recordCreationDate encoding="w3cdtf">2012-02-15 14:22:02 -0500</recordCreationDate>
        <recordChangeDate encoding="w3cdtf">2012-02-15 14:36:39 -0500</recordChangeDate>
        <recordIdentifier>6568</recordIdentifier>
        <languageOfCataloging>
            <languageTerm authority="iso639-2b">eng</languageTerm>
        </languageOfCataloging>
    </recordInfo>
    
</mods>
