LiveJournal revival!!

LiveJournal has been a powerful force in my life, as it has been for many people of my demographic (English-speaking nerds born in 1975-1985). It started winding down in 2010. In the last couple of years, I've been talking about the idea of Christmas-time revival, and to my happy surprise someone else has decided to make it happen. Maybe it will stick this time, though few of my friends can spare the time nowadays and many of us now get our social needs met IRL (especially those of us who came to the Bay Area).

mirror of this post

lit review: Kullback-Leibler divergence in Statistics PhD textbooks

Bickel, Klaassen, Ritov, Wellner
* p. 227, Re: estimating cell probabilities in a two-way contingency table with known marginals.  "Iterative Proportional Fitting converges to the minimum Kullback divergence estimator"

Casella & Berger (2nd Edition) says nothing

Gelman, Carlin, Stern, Rubin (2nd Edition)
* p.107, asymptotics of estimation under mis-specified model.
* p.586-588, details about this.

Hastie, Tibshirani, Friedman
 (2nd Edition)
* p. 561, KL "distance".  Used in definition of mutual information, on a section about ICA.

Hogg, McKean, Craig (6th edition) says nothing

* p.113-114, "Kullback-Leibler information number", on chapter titled "Strong Consistency of Maximum-Likelihood Estimates"

* p.59: proof question: show that KL > 0 unless the two distributions are equal.
* p.156: consistency of MLE.
* p.466: solution to the proof question.

Lehmann & Romano
* p. 432: optimal test rejects for large values of T_n, and T_n converges to -K(P0, P1).
* p. 672: in non-parametric test for the mean, KL is used to define the most favorable distribution in H0.

van der Vaart
* p. 56: asymptotics of estimation under mis-specified model.
* p. 62: viewing MLE as an M-estimator.

* p. 125: consistency of MLE. "By LLN, M_n converges to -D(θ*, θ)"
mirror of this post

yoga and intersubjectivity: let's invent a language

(originally posted Oct 25, 2012, 01:40)

Perhaps one of the defining traits of "nerds" is a low level of body awareness, which comes with "spending too much time in the head". This may explain why yoga has been so revealing for me. I have been learning which sensations correspond to stretch, strain, and pain; and how to move muscles independently of other muscles (often my brain used to think of them as just one thing). Sometimes I need visual feedback to learn to control my muscles. I am lucky to have a teacher who understands how unintuitive this is for me.

I wish we had a standard language for naming specific sensations. I would like to convey precisely the twinge on my lower back, which might be a pinched nerve, but might just be soreness. If my teacher could feel what I feel, he would know what it was, but instead his judgement has to rely on my imperfect attempts at describing it.

When it comes to bodily sensations, we don't know how much subjectivity there is. Psychologists (psychophysicists) can often quantify the subjectivity of senses (say color), because even when words fail, they can do experiments to test whether subjects are able to detect tiny differences in stimuli (perhaps defining a metric on perceptual space, or more!), and then quantify how much people differ in this ability, in different regions of stimulus space. But when it comes to your body, it is much harder to stimulate a sensation to a precision worthy of being called "reproducible". And then there's habituation (which is also a problem for scientists trying to study smell).

Right now you could start a philosophical food fight by bringing up the label-switching problem (namely that, just because you and your teacher are in verbal agreement doesn't mean that your experiences agree), but I just want to be practical here: how can we develop a shared vocabulary that would allow me to better convey my sensation to my teacher, so that he may make a better guess about what is wrong with my back? Are there existing human cultures in which people can easily convey their bodily sensations to each other?

I think that the biggest obstacle here is establishing joint attention. It is easy to teach the names of visual stimuli to a seeing person. But when it comes to coining words to describe types of pain in the back, this becomes like two blind people trying to come up with words for categorizing shapes (they can experience shapes by touch, but without joint attention, i.e. let's say they are not allowed to pass shapes to each other).


Why are "the arts" traditionally visual and/or auditory? Because out of all our senses, vision and hearing are the only senses whose stimulus-response mapping is reliable enough. With the other senses, there is too much variation within and across people to have any control over their experience (which also explains why we have so few olfactory words/concepts). Smell and taste have very little spatiotemporal resolution. Touch may actually be a good candidate.

mirror of this post

hyper-abstracted R contest

(originally posted December 02, 2009, 21:44)

## `*` is the hyper of `+`, `^` is the hyper of `*`
> hyper <- function(fn) function(a,b) Reduce(fn, rep(a,b))

> compose <- function(fn1,fn2) function(x) fn1(fn2(x))

> hyperoperation <- function(n) Reduce(compose,listRep(hyper,n))(`+`)

('rep(obj,n)' and 'listRep(obj,n)' just return a list containing 'obj' n times. I had to invent 'listRep' for technical reasons, namely passing closures to 'rep' returns an error: "object of type 'closure' is not subsettable")

get it yet?Collapse )

my Burning Man checklist

This is my first Burn... 8 days from now.

== desert weather ==
* camelbak: BOUGHT
* goggles: BOUGHT
* dust mask: BOUGHT

== camping ==
* tent: BORROWED
* rebar: BOUGHT
* reflective material for cooling: BOUGHT

== sleeping ==
* ear plugs (gel): BOUGHT
* sleep mask: ORDERED
* sleeping bag: ORDERED
* self-inflating mattress (3" thick): ORDERED
* blanket: ALREADY HAVE

== ride arrangements ==
I need to ride with Victoria, since we are splitting a Will Call ticket.
We arranged a ride in a sedan transporting 4 Burners and our stuff, which I think is very tight.
There might not be room for most of my stuff, so I will try to find other people who can transport my bicycle.

== lights ==
* headlamp: ORDERED
* reflective tape: ORDERED
* blinky reflective vest: ORDERED
* solar-powered light: BOUGHT
* bicycle lights: NEED BATTERIES

== electricity ==
* solar-powered phone charger: THANKS,GOOGLE
* batteries:

== survival ==
* dish, mug, cutlery: BOUGHT
* water:
* food:
* clothing for cold:

== MOOP ==
trash bags:

== other ==
duct tape: BOUGHT

mirror of this post

my taxes

I file taxes as a Resident Alien, which makes my tax situation pretty much identical to most American PhD students at Columbia. And yet, because of legal liability, the only qualified people who are willing to give us advice are tax professionals (whose time will cost at least $100).

So here's the basic calculations I do, before I start working on my tax forms. This is not tax advice. This is not legal advice.

Add up the income:
* TA Wages, see W2 form
* Stipend: look through bank account or MyColumbia, and add all the checks issued in 2012.
* Interest income: Chase sent me a form ("in lieu of 1099-INT"), informing me that I made $4.58 in interest, of which $0.00 was withheld. However, Chase charged an "Agent Admin Fee" $4.58, which means that I'm going to pay tax on money that I never saw... So let's be thankful that Savings accounts have such crappy interest!

Add up the withholdings:
* TA Wages, see W2 form
* Stipend: look through checks at MyColumbia. Check whether they withheld anything. In my case, they didn't.

Confusing things:
* My scholarship exactly cancels out tuition+fees. This means I don't need to look at the 1098-T, even though the university is obligated to send it to me. I think this form concerns the university's taxes wrt me, not my taxes.
* Unlike most international students, I should not receive a 1042-S (it's only for Non-Resident Aliens)

Since no money is being withheld from my stipend checks, I expect to owe money to the IRS, on the order of a few thousand per year.

My stipend is taxable

mirror of this post

conditional inference; why completeness matters

Earlier this week, another piece of statistical theory fell into place for me, this time inspired by reading Cox&Hinkley.

One of the key principles expounded in this book is known as the "conditionality principle": given your model, if you can find a statistic that is ancillary (i.e. invariant to the parameter of interest), then your likelihood function should be conditional on it.

Now, if the minimal sufficient statistic is complete (as is the case in any full-rank exponential family), Basu's theorem tells us that any ancillary statistic will be independent of it, i.e. there is a clean separation between sufficient and ancillary. But in curved exponential families, it can happen that there is no maximal ancillary statistic, i.e. you may have multiple choices of ancillary statistic, but combining them yields a statistic that is no longer ancillary. This is a bit troubling to me, because it breaks the nice idea of a bijection between model and likelihood function.

Given a choice between two ancillaries, C&H advises selecting the one whose Conditional Fisher Information has the greater variance. It's not immediately obvious why one should do this, but I think this can be understood as the Conditional Fisher Information giving us a lens into the conditional likelihood function. For example, if the conditional Fisher Information has 0 variance, it may be because the ancillary statistic doesn't add any information (as is the case when the minimal sufficient statistic is complete). However, it still seems plausible to me that the Conditional Fisher Information can be constant (independent of the ancillary statistic) even while the likelihood function is sensitive to it.

C&H also hint at a notion of partial sufficiency/efficiency and how to measure it: just compute a Conditional Fisher Information, conditioning on the proposed statistic.

(Since Fisher Information is an expectation, Conditional Fisher Information is the expectation of a conditional distribution; since the quantity on the LHS is a function of the sufficient statistic, conditioning on the sufficient statistic will not change anything, whereas conditioning on something insufficient can have the effect of making the log-likelihood smoother, and the Fisher Information smaller) Conditioning on ancillary, however, doesn't simple make the log-likelihood sharper: the average of the Conditional Fisher Information is just the Fisher Information.

[the last paragraph is probably wrong; please comment]

mirror of this post