Examining the data – Five Colors Science & Technology

How science is done (II)

How closely can you look at your input?

Last week, we mentioned the long-running program that Roger Griffin has been running to determine orbits of binary stars. It is unusual among modern astronomy programs for its longevity and the small number of people involved, among other things. It’s also unusual in that Roger can (and indeed must) consider the reliability of each data point he admits into the process: each measure of radial velocity has its impact on the final calculated orbit.

One of the educational (and sometimes, we admit, entertaining) features of Roger’s papers is his discussion of observations, both his and others’. Astronomy is unusual among the physical sciences in that data from a century or more ago is still very useful. One certainly wouldn’t include a measurement from a nineteenth-century laboratory in a modern determination of the mass of the electron, but often an equally ancient estimate of the brightness of a star can be important in a modern model of stellar evolution. Roger has discussed the reliability of radial velocities from century-old photographic plates together with measurements from massive satellite-derived databases, and not always in favor of the latter.

Consider the Hipparchos satellite, now superseded by Gaia but important in its time: positions and brightness measurements for 2.5 million stars (to high precision for almost 120,000). Examining each measurement manually for errors or, to be less extreme, unusual features would be impossible for any small team of workers in any reasonable time. Of course this was realized and various automated methods to flag suspicious values were used. Still, there are always unforeseen ways the universe can behave not according to plan, and problems slip through.

Our astronomer met with an example of automatic error generation early in his career. He was seeking a census of the Local Group, the galaxies concentrated around the Milky Way and Andromeda; at the time there were thirty, maybe a few more, known. He found one paper, however, listing over fifty. Examining the extra candidates, he found that all of them were more distant galaxies with nearby stars superimposed on them. The low radial velocities of the stars (which are all nearby, in the Milky Way) had been imputed to the galaxies, and the authors had automatically assigned them to the nearby Local Group.

The astronomers in charge of the far larger current and planned surveys (Gaia, SDSS, LSST) are of course aware of this kind of problem and go to strenuous lengths to check their data. But the checking has to be automatic, computerized; no manual methods can deal with the sheer volume. And no one has yet found a way to computerize Roger.

Especially we’d like to find a way to computerize his attitude toward recalcitrant data points, found in a paper from 2013: “There are some unusually bad residuals in Table XII; it is not possible to distinguish between stellar and instrumental origins for them, and is seems better to retain them in the solution of the orbit than to reject them for no better reason that that we don’t like them and wish that they were not there.”