Posted on 14-02-2008
Filed Under (documentation) by Linux Poweruser Programmer

The Theoretical Limits of Statistical High Dimensional Nearest Neighbor Algor…

55 min – Sep 6, 2007

Google Tech Talks
September 6, 2007

Suppose we have $n$ Bernoulli(1/2) long sequences of bits. Let $n-2m$ sequences be completely independent, while the remaining $2m$ sequences are composed of $m$ independent pairs. The interdependence within each pair is that their bits agree with probability $1/20$. The exponent $1/p$ is optimal in a large natural class of algorithms which we name Bucketing Codes. Moreover if one sequence out of each pair belongs to a known set of $n^{(2p-1)^{2}-epsilon}$ sequences, than pairing can be done using order $n$ comparisons!

These are extended to a general discrete independent data model. The performance of Bucketing Codes is bounded by a newly defined Bucketing Information. That bound can be asymptotically attained by randomly constructed bucketing codes. Our Information Theory inspired approach is contrasted with the worst case analysis dominating the literature.

Speaker: Moshe Dubiner
video
http://video.google.com/videoplay?docid=8154144142390189413


Sphere: Related Content

Tags: , , , , ,

Related posts

(0) Comments    Read More   
Post a Comment
Name:
Email:
Website:
Comments: