Academic Commons Search Results
http://academiccommons.columbia.edu/catalog.rss?f%5Bauthor_facet%5D%5B%5D=Jebara%2C+Tony&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usApproximating the Bethe partition function
http://academiccommons.columbia.edu/catalog/ac:171018
Weller, Adrian; Jebara, Tonyhttp://dx.doi.org/10.7916/D8M043F6Fri, 21 Feb 2014 00:00:00 +0000When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy F, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced for attractive binary pairwise MRFs which is guaranteed to return an ϵ-approximation to the global minimum of F in polynomial time provided the maximum degree Δ=O(logn), where n is the number of variables. Here we significantly improve this algorithm and derive several results including a new approach based on analyzing first derivatives of F, which leads to performance that is typically far superior and yields a fully polynomial-time approximation scheme (FPTAS) for attractive models without any degree restriction. Further, the method applies to general (non-attractive) models, though with no polynomial time guarantee in this case, leading to the important result that approximating log of the Bethe partition function, logZB=−minF, for a general model to additive ϵ-accuracy may be reduced to a discrete MAP inference problem. We explore an application to predicting equipment failure on an urban power network and demonstrate that the Bethe approximation can perform well even when BP fails to converge.Computer scienceaw2506, tj2008Computer ScienceTechnical reportsBethe Bounds and Approximating the Global Optimum
http://academiccommons.columbia.edu/catalog/ac:156060
Weller, Adrian; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:18853Mon, 28 Jan 2013 00:00:00 +0000Inference in general Markov random fields (MRFs) is NP-hard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in #P. We prove new formulations of derivatives of the Bethe free energy, provide bounds on the derivatives and bracket the locations of stationary points, introducing a new technique called Bethe bound propagation. Several results apply to pairwise models whether associative or not. Applying these to discretized pseudo-marginals in the associative case we present a polynomial time approximation scheme for global optimization provided the maximum degree is O(log n), and discuss several extensions.Computer scienceaw2506, tj2008Computer ScienceReportsAn SVM learning approach to robotic grasping
http://academiccommons.columbia.edu/catalog/ac:154419
Pelossof, Raphael; Miller, Andrew T.; Allen, Peter K.; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:15186Mon, 05 Nov 2012 00:00:00 +0000Finding appropriate stable grasps for a hand (either robotic or human) on an arbitrary object has proved to be a challenging and difficult problem. The space of grasping parameters coupled with the degrees-of-freedom and geometry of the object to be grasped creates a high-dimensional, non- smooth manifold. Traditional search methods applied to this manifold are typically not powerful enough to find appropriate stable grasping solutions, let alone optimal grasps. We address this issue in this paper, which attempts to find optimal grasps of objects using a grasping simulator. Our unique approach to the problem involves a combination of numerical methods to recover parts of the grasp quality surface with any robotic hand, and contemporary machine learning methods to interpolate that surface, in order to find the optimal grasp.Roboticspka1, tj2008Computer ScienceArticlesAn EM Algorithm for Localizing Multiple Sound: Sources in Reverberant Environments
http://academiccommons.columbia.edu/catalog/ac:148573
Mandel, Michael I.; Ellis, Daniel P. W.; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:13686Wed, 27 Jun 2012 00:00:00 +0000We present a method for localizing and separating sound sources in stereo recordings that is robust to reverberation and does not make any assumptions about the source statistics. The method consists of a probabilistic model of binaural multisource recordings and an expectation maximization algorithm for finding the maximum likelihood parameters of that model. These parameters include distributions over delays and assignments of time-frequency regions to sources. We evaluate this method against two comparable algorithms on simulations of simultaneous speech from two or three sources. Our method outperforms the others in anechoic conditions and performs as well as the better of the two in the presence of reverberation.Electrical engineering, Applied mathematicsde171, tj2008Computer Science, Electrical EngineeringArticlesStructured Prediction Models for Chord Transcription of Music Audio
http://academiccommons.columbia.edu/catalog/ac:148442
Weller, Adrian Vivian; Ellis, Daniel P. W.; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:13651Tue, 26 Jun 2012 00:00:00 +0000Chord sequences are a compact and useful description of music, representing each beat or measure in terms of a likely distribution over individual notes without specifying the notes exactly. Transcribing music audio into chord sequences is essential for harmonic analysis, and would be an important component in content-based retrieval and indexing, but accuracy rates remain fairly low. In this paper, the existing 2008 LabROSA Supervised Chord Recognition System is modified by using different machine learning methods for decoding structural information, thereby achieving significantly superior results. Specifically, the hidden Markov model is replaced by a large margin structured prediction approach (SVMstruct) using an enlarged feature space. Performance is significantly improved by incorporating features from future (but not past) frames. The benefit of SVMstruct increases with the size of the training set, as might be expected when comparing discriminative and generative models. Without yet exploring non-linear kernels, these improvements lead to state-of-the-art performance in chord transcription. The techniques could prove useful in other sequential learning tasks which currently employ HMMs.Electrical engineering, Applied mathematicsaw2506, de171, tj2008Computer Science, Electrical EngineeringArticlesBehavior-Based Network Traffic Synthesis
http://academiccommons.columbia.edu/catalog/ac:142667
Song, Yingbo; Stolfo, Salvatore; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:12017Fri, 16 Dec 2011 00:00:00 +0000Modern network security research has demonstrated a clear necessity for open sharing of traffic datasets between organizations a need that has so far been superseded by the challenges of removing sensitive content from the data beforehand. Network Data Anonymization is an emerging field dedicated to solving this problem, with a main focus on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Network traffic synthesis (or simulation) is a closely related complimentary approach which, while more difficult to accurately execute, has the potential for far greater flexibility. This paper leverages the statistical-idiosyncrasies of network behavior to augment anonymization and traffic-synthesis techniques through machine-learning models specifically designed to capture host-level behavior. We present the design of a system that can automatically learn models for network host behavior across time, then use these models to replicate the original behavior, to interpolate across gaps in the original traffic, and demonstrate how to generate new diverse behaviors. Further, we measure the similarity of the synthesized data to the original, providing us with a quantifiable estimate of data fidelity.Computer scienceys2242, sjs11, tj2008Computer ScienceArticlesMarkov Models for Network-Behavior Modeling and Anonymization
http://academiccommons.columbia.edu/catalog/ac:135492
Song, Yingbo; Stolfo, Salvatore; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:10682Mon, 11 Jul 2011 00:00:00 +0000Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling.Computer scienceys2242, sjs11, tj2008Computer ScienceTechnical reportsDynamical Systems Trees
http://academiccommons.columbia.edu/catalog/ac:109721
Jebara, Tony; Howard, Andrewhttp://hdl.handle.net/10022/AC:P:29201Tue, 26 Apr 2011 00:00:00 +0000We propose dynamical systems trees (DSTs) as a fexible model for describing multiple processes that interact via a hierarchy of aggregating processes. DSTs extend nonlinear dynamical systems to an interactive group scenario. Various individual processes interact as communities and sub-communities in a tree structure that is un-rolled in time. To accommodate nonlinear temporal activity, each individual leaf process is modeled as a dynamical system containing discrete and/or continuous hidden states with discrete and/or Gaussian emissions. Subsequent, higher level parent processes act like hidden Markov models that mediate the interaction between leaf processes or between other parent processes in the hierarchy. Aggregator chains are parents of the child processes the combine and mediate, yielding a compact overall parameterization. We provide tractable inference and learning algorithms for arbitrary DSTs topologies via structured mean field. Experiments are shown for real trajectory data of tracked American football plays where a DST tracks players as dynamical systems mediated by their team processes mediated in turn by a top-level game process.Computer sciencetj2008Computer ScienceTechnical reportsTree Dependent Identically Distributed Learning
http://academiccommons.columbia.edu/catalog/ac:110465
Jebara, Tony; Long, Philip M.http://hdl.handle.net/10022/AC:P:29432Thu, 21 Apr 2011 00:00:00 +0000We view a dataset of points or samples as having an underlying, yet unspecified, tree structure and exploit this assumption in learning problems. Such a tree structure assumption is equivalent to treating a dataset as being tree dependent identically distributed or tdid and preserves exchange-ability. This extends traditional iid assumptions on data since each datum can be sampled sequentially after being conditioned on a parent. Instead of hypothesizing a single best tree structure, we infer a richer Bayesian posterior distribution over tree structures from a given dataset. We compute this posterior over (directed or undirected) trees via the Laplacian of conditional distributions between pairs of input data points. This posterior distribution is efficiently normalized by the Laplacian's determinant and also facilitates novel maximum likelihood estimators, efficient expectations and other useful inference computations. In a classification setting, tdid assumptions yield a criterion that maximizes the determinant of a matrix of conditional distributions between pairs of input and output points. This leads to a novel classification algorithm we call the Maximum Determinant Machine. Unsupervised and supervised experiments are shown.Computer sciencetj2008Computer Science, Center for Computational Learning SystemsTechnical reportsSquare Root Propagation
http://academiccommons.columbia.edu/catalog/ac:110433
Howard, Andrew; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:29422Thu, 21 Apr 2011 00:00:00 +0000We propose a message propagation scheme for numerically stable inference in Gaussian graphical models which can otherwise be susceptible to errors caused by finite numerical precision. We adapt square root algorithms, popular in Kalman filtering, to graphs with arbitrary topologies. The method consists of maintaining potentials and generating messages that involve the square root of precision matrices. Combining this with the machinery of the junction tree algorithm leads to an efficient and numerically stable algorithm. Experiments are presented to demonstrate the robustness of the method to numerical errors that can arise in complex learning and inference problems.Computer science, Applied mathematicstj2008Computer ScienceTechnical reportsMulti Facet Learning in Hilbert Spaces
http://academiccommons.columbia.edu/catalog/ac:110474
Kondor, Risi; Csányi, Gábor; Ahnert, Sebastian E.; Jebara, Tonyhttp://hdl.handle.net/10022/AC:P:29435Thu, 21 Apr 2011 00:00:00 +0000We extend the kernel based learning framework to learning from linear functionals, such as partial derivatives. The learning problem is formulated as a generalized regularized risk minimization problem, possibly involving several different functionals. We show how to reduce this to conventional kernel based learning methods and explore a specific application in Computational Condensed Matter Physics.Computer sciencetj2008Computer ScienceTechnical reports