Disclaimer

These wafflings are submitted for public review of my own accord. I don’t have any official endorsement from any academic group. My only qualification is that I am programmer working on the back end at the BCCVL.

Absence Data

So I’ve been getting to know the excellently crafted infrastructure here at the BCCVL, debugging various R scripts, processing data, and getting a feel for how the entire process happens. When I decided that I was going to give the wheels a spin myself (in my own way), the first thing that started to really ponder was the absence data. Occurrence data seems plentiful and appeals as an intuitively useful thing – “I saw the creature here. It was a Koala it was sleeping in a tree” – i.e. the animal was observed there, so it’s highly likely that the habitat is suitable for it. But saying that no Koalas are present in a particular location is a much stronger  claim and, at least at face value, subject to reasonable doubt. Cue intense debate in the comments section! (I wish)

What about if I think of occurrence data as corresponding to certainty,  and absence data as corresponding to uncertainty. What if the SDM wasn’t predicting occurrence/absence, but rather, certainty/uncertainty. I think I might be able to use a model that tells me where something certainly should exist, but otherwise professes uncertainty.  That seems like a reasonable place to start from.

What I am really trying to do is convince myself that using “pseudo absence data” might make sense. Pseudo absence data is where you randomly generate coordinates on the globe and pretend that no Koalas exist there, so to speak.

If the features that we train with are sampled from stationary processes, it is reasonable to claim that the pseudo absence data should have little or no predictive relationship with the feature data. Hopefully the occurrence data depends on the feature data in some deducible or obvious way. Waving my hands a little bit, the pseudo absence data should correspond to noise, the occurrence data should lead (at least indirectly) to some kind of detectable signal in the context of said noise. I think I’m happy to start from there. I’m sensing some deeper mathematics.

My curiosity is piqued.  I should start poking around the literature and build my knowledge of the mathematically underpinnings of such approaches. There is some fairly current literature about pseudo absence data actually. Might have to read it. In the meantime, I’ll try a quick experiment to see what happens.

– the wooly mammoth

Share This