Friday, August 20, 2010

Astro2010 report and "virtual astronomy" in the next decade

The Astro2010 report is out, and the results are ... mixed. Most people interested in data-intensive astronomy, VO, etc., are delighted with the strong blessing of the LSST (as expected with a high probability). But what does the report imply for the field of data-driven, computationally-enabled astronomy, VO, astroinformatics, etc.?

Ten years ago, the previous decadal survey brought the concept of the VO in the forefront, and the enterprise took off. The vision was there: science transformed and enabled by the new technology of bits. Many other fields of e-Science look at astronomy with an envy and appreciation, we are supposedly the poster child of the e-Science/Cyber-Infrastructure/X-informatics movement. We went through 7 Moore's law cycles since then, with a 100-fold increase in data volumes. The importance of the information and data technologies ought to be even more obvious today. So, what is the vision for the next decade?

The new report offers predictable and obvious blessings of data archiving, and VO is mentioned as a provider of standards for such activities. It pretty much says that the NSF and NASA should continue with their existing policies on this front. That is good. But that is all. It is all "just archiving" - hardly any intellectual substance worth bothering with. You can practically envision the committee yawning and moving on to the more exciting stuff. The word "computation" is taken to mean "HPC in service of ever more ambitious numerical simulations". Hardware is what matters. Never mind that the facilities like LSST, WFIRST, etc., are critically dependent on a different kind of information and computation technologies and methodologies to fulfill their scientific promise and potential (not just to archive their data).

What about the new paradigm for doing science? What about the knowledge discovery and science that simply cannot be extrapolated form the old, pointed-observations, small-samples mode? What about the rapidly evolving methodology of science in the 21st century? It is all about the bits and knowledge, not so much about the atoms assembled in the form of expensive facilities. We are still in the midst of what may be the most profound transformation of science and scholarship ever, driven by the exponentially growing information and computation technology. Unfortunately, there is not much of an indication in the present report of a true understanding of this transformation of science, and not a trace of the vision it inspires. As a field, we may have stepped back. How did we get here?

The decadal survey represents fairly the views, priorities, and the zeitgeist of the astronomy community as a whole. The committee had some tough decisions to make, and they did convey, accurately, in my opinion, what the astronomy community as a whole thinks about VO and such things. VO is (at best) regarded as "just archives", and this opinion is not entirely wrong, as the things stand. So I can understand where the Astro2010 committee is coming from, even though I may wish that they had shown at least as much vision and understanding in this arena as their predecessors did 10 years ago. Sure, some relevant testimony to the committee was provided, and a couple of pertinent and thoughtful white papers were submitted - lost among the hundreds of others, many of which can be fairly described as so much self-serving propaganda, so it would be easy to miss the good stuff. But the cold, cruel fact is that the astronomy community's appreciation, enthusiasm, and support for the VO idea evidently has not grown over the past decade; rather, it seems to have declined, and it may well continue in the same direction.

The problem, I think, is that we, the "greater VO" community, have failed miserably to convey the importance and the potential of these technologies and methodologies to the broader astronomy community as a whole, in a convincing and compelling fashion. Certainly, a lot of hard, unglamorous, and necessary work has been done. However, we have not delivered on the promises of a new path to scientific discovery, envisioned 10+ years ago. I have argued elsewhere why I think this is, and I accept my own share of the blame. But how do we find a constructive way forward?

This matters. The funding agencies are supposed to take their hints from the decadal report, everyone follows the money, and students and postdocs are wise to enter the well-supported fields and projects. This is not just a U.S. problem - these dominoes tend to fall in order, and far away.

Given this situation, what is the long-term future of the VO? Will it become a fully owned subsidiary of the LSST Corp.? Maybe that is not such a bad thing, considering the management and the political skill the LSST leadership has shown, and the growing collection of e-Science talent the LSST project has assembled. We have already seen a number of excellent young people who were engaged in the formative stages of the VO move on to the scientifically greener pastures elsewhere. Maybe VAO will devolve to be a standard-setting body and a debating society, much like IVOA, say, with the exciting development and applications of discovery tools happening elsewhere.

Or maybe we need to think about the broader evolution of our field, and the methodologies it has to develop in order to function effectively in the era of an exponential data abundance, in a project-independent way. After all, this is all about a new, universal scientific methodology, not about the particular needs of any one given project. Imagine a situation where there are no independent detector, instrument, or telescope engineering and development efforts, other than through a few blessed mega-projects. Is this really an optimal path for our field? Or is it simply the only available path?

In any given situation one must operate within the framework determined by the political realities. The astronomy community produced the Astro2010 report, and it should now stand behind it, such as it is, even though there are several sub-fields which feel shortchanged by it. That is the playing field.

So, where does the VO/astroinformatics community really want to be 10 years from now, and how do we get there?

What do you think?


  1. Ah! I replied, but must have clicked just to the right of where I intended and sent an email instead:

    "300 char limit? Says a lot about AstroInformatics right there... Reject premise. Decadal surv. steers new initiatives. VAO exists. Declare victory for 2000 surv. & tackle the issues. 7 Moore's cycles is red herring - will take many decades to reengineer the oldest science."

    Not sure I have much to add at the moment, but would have said it with much more poetry and zing (or maybe schadenfreude and bling).

  2. What has the VO provided beyond "just archive"? ... or more fairly, what has been developed beyond just data management, with a smattering of visualization, rhetoric about 'federated data,' and technology driven implementation of 20th century science analysis tools?. I sincerely don't know what the full answer to this question is, but I bet it is "little or nothing has been developed and made generally available, easily useable, and understood." So I agree with your assessment of the failure of the community, but think it goes well beyond just conveying importance.

  3. Gentlemen, thank you for these concise and clear illustrations of the perceptual problems we have to face, from both sides of the VO compound.

    The goal here is to start a meaningful and thoughtful discussion. While I am not smart enough to understand how the blogspot format limits say anything at all about the field of astroinformatics, I would be happy to add to the post authors list anyone who wishes to make a more extended comment. Just email me. Thanks!

  4. Astronomy is not and never will be a top down exercise. The Decadal Survey is one useful exercise in setting priorities for certain funding streams. One of the significant findings this time was to free up more flexible funding for medium size projects on both the space and ground-based sides.

    It is clearly (and not unexpectedly) the intent that software funding be tied to specific projects. If LSST and JDEM (er...WFIRST) need interesting new compute resources then they are expected to budget for this. There was also a clear statement that the ground-based centers should continue to pursue archive initiatives similar to the space data archives. Maybe NSF will budget for this activity.

    Archival data is certainly a prerequisite for astroinformatics, whether or not more money is expended on the "atoms" of new instrumentation and telescope platforms. As you say though, archives aren't enough.

    Ultimately it all comes down to experimental design. The time domain will not yield - no matter how clever the rapidly evolving methodology of our bits and knowledge - without synoptic surveys. No amount of investment in photon counting algorithms or related informatics matters in the absence of xray and neutrino telescopes to provide event lists.

    VAO funding has just kicked in. VO efforts over the past decade have been productive if not (yet) world-changing, but the past is prelude. VAO funding has just kicked in. We have the opportunity to build interesting astronomical infrastructure. Informatics and semantics will have to justify themselves through clear science use cases just like lots of other potential "virtual" activities. If informatics is useful it will succeed in proportion to what science it supports. Political realities may accelerate or retard the future pace of discoveries, but politics can't keep useful techniques from being developed and promulgated in the first place.

    Is there a critical science case for AstroInformatics? Make it clear - and stay on message - and make your case again and again. It isn't about previous conferences and reports - it's about the future reports and presentations and posters and blogs. One person's "self-serving propaganda" is another's painstakingly framed science case for critically needed facilities.

  5. Here is my view, George. Other societies I am in have also toyed with the idea of declaring victory and returning home.

    We're in a downturn situation right now, for science funding as well as otherwise. That means overtly and beneath the surface a good deal of restructuring - and opportunity. What can AI (AstroInformatics) offer? Off the top of my head:

    - Citizen science is a paradigm shift.
    - Exascale - good to be associated with these industry trends. Likewise for GPGPUs.

    The mood (in Europe certainly) is turning towards big projects - Joint Programming Initiatives for socially driven R&D, FET [Future Emerging Technologies] Flagships for computing innovation, and - now underway - Future Internet, among others.

    Ambition and big ideas spanning multiple fields as here in AI are needed.

  6. I think the paradigm shift and big ideas would be helped by a tool set that will make astroinfomatics easier to learn and use, a toolbox that can easily grow to different platforms and usage, but unified by standards like SQL and VO.

    A lot of working astronomers already know the CasJobs interface that was intoduced with the SDSS. For many, it is the only database interface they have used. I suspect they would like to install CasJobs on their own resources with their own data. I suspect that a port of CasJobs to Linux/MySQL would have an immediate take up from pent up demand.

    Following that success, I would suggest
    -- Python API
    -- API and library for data mining
    -- NED and Simbad and ADS and SDSS
    -- 1000's of VO datasets
    -- install at home or in the cloud
    -- with Crossmatch and Join done properly
    -- with Images and Spectra and Light Curves
    -- with Web/App front end, sharing, wiki

    and finally …
    -- How to use the product for the "data management plan" now required of all NSF proposals.

  7. I've seen a number of comments that the 2010 Decadal Review is pragmatic unlike the 2000 Decadal Review. Isn't that what we should have been expecting though? The 2000 DR was born out of a preceding decade of prosperity, a Democratic administration, hope and promise on the international scene whereas the 2010 DR is the product of a decade of Republicanism, 9/11, Iraq, Afghanistan, recession. It's hard to be inspirational and visionary when the wolf is at the door.

    There is positive language in there - "the confluence of stunning discoveries, technological advances, and powerful ideas has made this a special time in astronomy and astrophysics" - but isn't this true of any decade in science since WWII. The review acknowledges that the sociology of astronomy has changed, that it is "more collaborative, more international, more interdisciplinary" and that "in some cases [data] even dominates the impact of a facility". Unfortunately, though, this really seems to be just lip service. The older generation acknowledging that the "youngest scientific minds" do things differently. It would be interesting to know if the median age of the committee was significantly different between 2000 and 2010?

    The meat of the review, however, is still very traditional. Computational astrophysics refers to simulations and modelling not the Cloud, or mobile computing. What astronomy would be feasible in a decade's time with a million strong distributed sensor network? LSST is the number one recommendation (which I'm pleased about) but it's a safe project: contrary to opinion, LSST is not rocket science. The data rates are a third that of LHC (operational a decade earlier) and the transient rates are about 4 a second. When the public data sets become available in 2020, it will not register - other projects and sciences will have trumped it by orders of magnitude. Something like SKA should be the visionary revolutionary mission that drives astronomy truly into the 21st century and all that that entails.

    What about the VO then? Is the lack of mention troubling? The VO is a success when its usage is commonplace and goes unnoticed. Unfortunately this is not yet the case - there was too much pitch and spin of a new concept, too little truth telling that we need to figure out the rules of this brave new world, too much trying to fly when we cannot even walk - but we will get there. It is concerning though that the review has given a prima facie excuse for not working with VO standards - "the format should be Virtual Observatory-compliant where this is cost effective". You can hear it now "we said in our proposal that we would be VO compliant (and that's why we got the money) but it turned out in practice not to be cost effective."

    So the Decadal Review could have been the shining light for international astronomy and there are quite a few places out there which need such a hope but it turned into something complacent. If we really are "on the threshold" then it should have not only have talked about the paradigm shift that data intensive astronomy/science is driving - "the style of carrying out research is different" - but actually put its money where its mouth is and laid out a truly innovative program to "maximize future scientific progress".

  8. I am especially interested in knowing why
    the Decadal Survey Committee (apparently) decided to ignore several position papers submitted to them on the emergence of Data-Intensive Astronomy (Astroinformatics) despite numerous national study reports (from the NSF, National Science Board, National Academies, the Office of Management and Budget, and the White House Office of Science and Technology Policy, just to name a few of the "minor" organizations) that have endorsed this new direction in scientific research. Research in this all-important national priority research area will be fundamental for astronomers to reap the maximum scientific research potential from the enormous sky surveys of the future (Pan-STARRS, LSST, etc.).

    WWTT??? (What were they thinking?)

    For reference, check out slide #13 in my presentation at this recent workshop on Computational AstroStatistics at the Harvard CfA:

