machine learning for astronomers

There is a Monday seminar at Princeton run by the astrophysics graduate students that focuses on useful skills and knowledge around research, rather than research results. That's a good idea!

I gave the seminar today; I spoke about machine learning in astronomy. I started with my ML taxonomy and my recommendation to understand five beautiful, simple, and instructive examples: SVM, linear regression, PCA, k-means, and GMM with the EM algorithm. How's that for acronyms! I think each of these five methods is so beautiful, everyone should know how each of them works and generalizes.

Each of these methods is in a different taxonomic category (in order: classification, regression, dimensionality reduction, clustering, and density estimation), and each is beautiful. The first three are linear and convex, and each (for related reasons) can be generalized with the kernel trick. In the second half of my talk I discussed this, but my explanation went off the rails. I think I left everyone confused. Time to do more homework.


#TESSninja, day 5

The day started with a discussion or break-out about making a latent-variable structure for the incredible result by Guy Davies (Birmingham) that the power-spectra of red-giants in an open cluster lie on a one-dimensional locus. Details include: He is only looking at the overall envelope of the power spectrum, parameterized by 8-ish parameters. His 8-ish parameters follow a one-dimensional locus of power laws with respect to each other, except one. That one is the white-noise level, which makes sense is different. So he has a two-dimensional model that seems to fit extremely well every single star power spectrum in an open cluster observed by Kepler!

This discussion merged into a longer discussion, code-named Light-Curve Cannon with contributions from many people looking at how time-domain behavior of stars on different time scales can be used to predict or infer stellar parameters. It is extremely promising that TESS-like time-domain data will be able to tell you stellar parameters at comparable precision to contemporary spectroscopic modeling! Ruth Angus (Columbia) did a great job of bringing together the threads in these discussions: There are many papers to write.

The day ended with a wrap-up in which everyone contributed one slide and spoke for less than two minutes. Here are the wrap-up slides. They only give you the tiniest hint at all the things that happened this week!

Thank you to Dan Foreman-Mackey (Flatiron) and the Flatiron CCA staff and the Simons Foundation events staff for an absolutely great meeting. In particular, Foreman-Mackey's vision, leadership, technical abilities, and good nature got everyone participating and working together. That's community building.


#TESSninja, day 4

Today was a short day at #TESSninja for me, because I had [life events]. But in the morning, I spent some time working with [unnamed participants] and I managed, through my efforts, to fully bork their code. I guess I really, really don't understand Python packages. I felt bad about that. You are supposed to move fast and break things and fail fast but I often participate in projects in such a way that I feel like I make them worse!

I also spoke with Ellie Schwab Abrahams (AMNH) and Ben Montet (Chicago) about linear regression to calibrate a Kepler light curve. You can think of calibration as a kind of regression (predicting data using housekeeping data); we worked out what that would look like and got Schwab Abrahams on to gathering the housekeeping data.


#TESSninja, day 3

My plan for #TESSninja is to work on automated approaches to radial-velocity follow-up of TESS discoveries. I am bringing some new things to this question. The first is that I am not going to ask “when should I next observe this planet candidate?”, I am going to ask “I have telescope time right now, which of my follow-up objects should I observe next?”. The second new thing is that I think that it is insufficient to make this decision only on the basis of information obtained in this observation. It should be made based on the future discounted information that it unlocks or makes available, under assumptions about observing into the future.

This second point was a breakthrough for me. It comes from this point: Imagine that you are using RV measurements to measure precise periods, and you want period information. The first observation you make gives you no period information whatsoever: It only constrains the overall system velocity! So you would never make that first observation if you cared only about the immediate information gain on period. You have to think about the future information-gain potential that your observation unlocks, discounted by your discount rate. Or even more complex objectives (yes, cash flow ought to be involved).

In other news, Guy Davies (Birmingham) made a nice point in discussion of the time-domain behavior of stars in an open cluster observed by Kepler: Because these stars ought to be the same age, and the same composition, and (on the red-giant branch) nearly the same mass, the asteroseismological (and jitter) signals ought to—in some sense—lie along a one-dimensional sequence in the relevant space. That's a great idea; I want to test that.


#TESSninja, day 2

The highlight today of #TESSninja was Ashley Villar (Harvard) showing the lightcurve of a supernova discovered in the K2 mission, with models over-plotted. It appears that the supernova is a type Ia, but the early-time light curve (and K2 was observing it well before the start) is not consistent with any null type Ia models. The early time requires an interaction of the explosion with some nearby material, probably a companion star! This is an important discovery and (I think) a first!

Earlier in the day I worked with Ellie Schwab (AMNH) and Ben Montet (Chicago) on detrending a particular low-mass star that Schwab is interested in. We discussed how to combine the full-frame image information (where we know more about calibration and integrated photometry) with the long-cadence data (where we have a limited aperture and know less).


#TESSninja, day 1

Today was the first day of Preparing for TESS, organized by Dan Foreman-Mackey (Flatiron) and others. It is organized like the #GaiaSprint in that it is a hack week, starting with pitches and dedicated to getting stuff done. The crew pitched some great ideas on day one and then hacked. I am trying to work on algorithmic approaches to efficient radial-velocity follow-up.

Melissa Ness (Columbia) and Megan Bedell (Flatiron) started an interesting project to follow up anomalous stars in an open cluster: Do the stars with element-abundance anomalies also show anomalies in the time domain or in asteroseismology? Many other projects are working towards obtaining cleaned or calibrated light curves, although my heart sang when various people (notably Rodrigo Luger at UW) pointed out that we don't want to de-trend, we want to have a model that explains every light curve as a combination of spacecraft and stellar variability (and planets).


#siRTDM18, day 5

Armin Rest (STScI) gave a nice talk about time-domain astronomy, with stuff about finding Earth-impactors and also light echoes. After his talk, I told him the insane project conceived by Rix, Schölkopf, and me about modeling the whole Milky Way as a set of flickering light sources and a three-dimensional map of dust, using time-domain imaging at very low brightness. That's probably not possible! Rest is part of a big new sky survey for near-earth asteroids, which will also do a lot of variable-star science.

After that, Sarah Richardson (Microbyre) talked about automating various aspects of phylogeny for various kinds of microbes. I was impressed by the robotics setups available to biologists! Her talk also contained a lot of biology-101 content for the physicists and engineers; I learned a lot (and felt, once again, my regret that I didn't take more biology in college!).

Late in the day, Josh Bloom (Berkeley) and I did some real-time decision-making at the Emoryville card room.


#siRTDM18, day 4

Today was road-traffic day at Real-time Decision Making at Berkeley. Jane MacFarlane (LBNL) and Alexandre Bayen (Berkeley) gave great talks about road dynamics. In MacFarlane's talk I learned that provided (by providers) mobile-phone location information is posterior information not likelihood information. And the priors are outrageously informative (like that every phone is on the midline of a known road!). That is good for the user (the mobile-phone owner), who wants navigation information, but not good for anyone trying to do hierarchical inference over phones or people! This is very related to the issues that Alex Malz (NYU) is working on in cosmology.

Bayen focused on the influence of mobile phones on traffic, which has been immense! As mobile phones have gained traction with drivers, they have driven traffic patterns to a non-optimal Nash equilibrium, where all paths from point A to B take the same amount of time. But these same phones also create crazy new nonlinear dynamics, because all drivers get re-routed simultaneously to a small number of alternate routes when something goes wrong. And it is like a repeating multiplayer game, because each routing company is constantly learning the dynamics induced by all the other companies! But this game is played out in the parameters of a set of differential equations, so it is crazy.

Things would be better if we could find a way to cooperate; this led to great lunch discussions with Josh Bloom (Berkeley). We discussed ways to capitalize on the fact that different drivers have different objectives. No existing apps capture this at all: They all optimize for the triviality of minimum expected travel time!


#siRTDM18, day 3

I arrived in Berkeley last night and today was my first day at a full-week workshop on real-time decision-making at the Simons Institute for the Theory of Computing at UC Berkeley. The day started with amazing talks about Large Hadron Collider hardware and software by Caterina Doglioni (Lund) and Benjamin Nachman (LBNL). The cuts from collisions to disk-writing is a factor of 10 million, and they are writing as fast as they can.

The triggers (that trigger a disk-writing event) are hardware-based close to the metal, and then software-based in a second layer. This means that when they upgrade the triggers, they are often doing hardware upgrades! Some interesting things came up, including the following:

Simulating is much slower than the real world, so months of accelerator run-time requires years of computing on enormous facilities just for simulation. These simulations need to be sped up, and machine-learning emulators are very promising. Right now events are stored in full, but only certain reconstructed quantities are used for analysis; in principle if these quantities could be agreed-upon and computed rapidly, the system could store less per event and then many more events, reducing the insanity of the triggers. And every interesting (and therefore triggered, saved) event is simultaneous with many uninteresting events, so in principle right now the system saves a huge control sample, which hasn't been fully exploited, apparently.

Of course the theme of the meeting is decision-making. So much of the discussion was about how you run these experiments so that you decide to keep the events that will turn out to be most interesting, when you don't really know what you are looking for!


data predicting data; bad Solar System

First thing in the morning, I met with Judy Hoffman (Berkeley) to discuss her computer-vision and machine-learning work. She suggested that machine-learning methods that are auto-encoder-like could be repurposed to make predictions from one kind of data to another kind of data on the same object. For instance, we could train an encoder to predict exoplanet RV signal, given Kepler light curve. Or etc! This appeals to me because it uses machine learning to connect data to data, without commitment to latent quantities or true labels for anything. She pointed me (relatedly) to a new kind of model called ADDA, for which she is responsible.

In the afternoon, Chiara Mingarelli (Flatiron) gave the NYU Astro Seminar about pulsar timing and gravitational radiation, expressing the hope and expectation that this method will deliver signals soon. She told a very interesting story about a false-positive detection that nearly went to press when they figured out that it was resulting from residuals in the Solar System ephemerides. The SS comes in because you have to correct Earth-bound timings to a frame that is at rest (or constant velocity with respect to) the SS barycenter.

This isn't the first time I have heard this complaint. The astronomical community really needs an open-source and probabilistic SS ephemeris, so we can use the SS model responsibly inside of inferences. Freedom-of-information act time?


adaptive observing programs

When God provides a meeting-free gap in the day, it is incumbent on the astronomer to use that time to do research. I stole time out of time today to work on preparation for my tess.ninja projects. My plan is to look forward to algorithmic approaches to adaptive observing campaigns, so that an exoplanet follow-up campaign from the ground can be simultaneously efficient at confirming true planets, measuring planet properties, and rejecting false positives, but also be useful for long-term future statistical projects. In general statistical usability and efficiency are at odds! These ideas are related to active learning but also decision theory. One question: Would a ground-based telescope time-allocation committee accept an active-learning proposal?

And I also did some information-theory-related math for Christina Eilers (MPIA): She is building latent-variable models for APOGEE spectra, but working in a low-dimensional basis that is a linear projection of the data; she needs measurement uncertainty estimates in the low-dimensional basis.



[No posts for a while: I was on a short break.]

A huge highlight today was in parallel-working meeting, where Elisabeth Andersson (NYU) got her code working to plot Kepler light curves folded on various periods. In particular, periods that are integer ratios of known exoplanet periods. We are going to search for resonant signals. And warning: We are even going to look for 1:1 resonances, which might have escaped detection previously! We did various hacks to flatten the heck out of the light curves, which we might come to regret if we don't come back to them.



Tried to write; total fail. Doing stuff for my two jobs. Not complaining! Just not researching, not today.


comoving and coeval stars; and the pre-infall Sagittarius

At Gaia DR2 prep meeting, I discussed comoving stars and related matters with Oh and Price-Whelan. We discussed moving from our work (in DR1) that made use of marginalized likelihoods for catalog generation to a parameter-estimation method. What would that look like? As my loyal reader knows, I prefer parameter-estimation methods, for both pragmatic and philosophical reasons. But once you go to parameter-estimation methods, there are lots of parameters you could in principle estimate. For example: You can look at the space-time event to which the two stars made their closest approach in the past, and how far apart they would be at that point. If the separation is small, then coeval? That might be much more interesting than co-moving, in the long run.

At Stars group meeting, Allyson Sheffield (CUNY) and Jeff Carlin (LSST) showed us results on abundances of M-type giant stars in the Sagittarius tidal streams. They can clearly see that the progenitor of the stream had element-abundance gradients in it prior to tidal stripping. They also show that the stream matches onto the abundances trend of the Sagittarius dwarf body. But the coolest thing they showed is that there are two different types of alpha elements, which they called explosive and hydrostatic, and the two types have different trends. I need to check this in APOGEE! Sheffield also mentioned some (possibly weak) evidence that the bifurcation in the stream is not from multiple wraps of the stream but rather because the object that tidally shredded was a binary galaxy (galaxy and satellite) pair! I hope that's true, because it's super cool.


writing on dimensionality

Because of work Bedell did (on a Sunday!) in support of the Milky Way Mapper meeting, I got renewed excitement about our element-abundance-space dimensionality and diversity work: She was able to show that we can see aspects of the low dimensionality of the space in the spectra themselves, mirroring work done by Price-Jones (Toronto) in APOGEE, but with more specificity about the abundance origins of the dimensionality. That got me writing text in a document. As my loyal reader knows, I am a strong believer in writing text during (not after) the data-analysis phases. I'm also interested in looking at information-theoretic or prediction or measurement approaches to dimensionality.