[PACI] [Report] [Agenda] [Participants] [Background Material]

Responses

Questions:

1. Assuming that connections to high performance Internet backbones evolve into a commodity service (to be purchased by universities fro m various commercial providers, gigaPoPs, or Abilene):

2. Assuming that NSF/ANIR phases out its direct support of high performance connections:

3. Assuming increased funding supporting applications and middleware:

4. Assuming that a lack of network engineering and network equipment resources is inhibiting SDSC's and NCSA's abilities to fully support the end-to-end requirements of partners, e.g. with respect to distributed applications and distributed computing:


Respondents:


Responses:

 

Andrew A. Chien

1.a. The primary challenges facing folks who wish to prototype
grids and develop application-enabling middleware is the availability
of bursty high performance, guaranteed high capacity with reservations,
and the availability of high speed onramps. These limitations
not only inhibit the development of a large number of significant
novel wide-area science and engineering applications, they also inhibit
the development of novel applications for future generations of high
performance network services. Its unclear if the commercial services
can provide the high bandiwidth, reservable communication capacity
*required* to perform scientific and controlled experiments and to
prototype both application and systems for the future.

b. High speed on ramps for partners to WANs, high speed paths to
other partner sites and leading edge sites. Lack of stable funding
to support development of these infrastructures both to support
the PACI's efforts to become Grids and to become high performance
Grid testbeds.

c. I would hope the NSF would take the lead in supporting the
infrastructure to make the above science, engineering, and grid
research possible. the bursty, high bandwidth use of the high
end applications makes setting up occasional high bandwidth connections
with the commercial providers difficult economically (big $$$'s) and
make the obstacle to experimentation far too high. If there were
a facilitated way that NSF could pool users and shared a fixed
capacity on a scheduled basis (batching a la the successful MOSIS
microelectronic fabrication program perhaps) that would be very
helpful. However for this to work well, we would need for this
to be low overhead to get access, connect almost everywhere, and
be subsidized to be quite inexpensive.

d. I'm not sure about SLA's, except to say that I don't know how
you can tell the service is satisfactory unless you have some monitoring
and performance data and the only way to ensure that you can get
good service is to tie payment to the quality.

2. I don't like this assumption, as I believe it will lead to a
a stall in high end network research. Future networks will be
even more predominantly hordes of 28.8K users. Not a bad thing
but we'll miss a golden opportunity with respect enabling high
performance distributed computing for science, engineering, and
other large scale computing challenges of national, global, or
even regional interest.

a. Universities in general are unlikely to be able to find the
money. They're strapped just trying to wire 10 and 100Mbps to
dorms, departments, and offices.

b. see my MOSIS comment above. Regarding low bandwidth utilization
concerns, on possibility is to allow low speed users on the network
but at a lower priority and allow them to be squeezed out when high
bandwidth users need the bandwidth. thus, the network can be a
commuter lane which can be monopolized by high performance applications
in priority over the lower speed users. May need some modest routing
advances to make this easy.

c. already starting to happen. again this will be bad for high end
networking research unless we can find a way to fund this. the cost
of experimentation at the high end will explode (because we're
climbing and exponential in capability). this means that to experiment
for N years out, we may need 10 or 100x the bandwdith, and I don't
see how the PI's or institutions can pay 10x or 100x in $$$'s.

3.a. Easy to use high bandwith wide-area computation. Manipulation
of 10 terabyte data sets (yes terabytes) by ordinary users on a routine
basis through high end graphical interfaces. In 5 years, a TB of
storage will be approximately $2K. Globalized network
services and information with only modest concern for locality.
Unified data and service access around the country and world.

b. Widespread deployment of *experimental infrastructure* for wide
area computing, storage, netowkring, and application experiments.
hese resources
need to be made network accessible, available with minimal overhead,
and available at minimal cost. Organizations like the current PACI's
have difficulty doing this because their platforms cost so much.
Further, they are monitored for "system utilization". Clusters are
the natural technology for this and recent advances in communication,
security, etc. make it possible to build versatile resources that
can be safely shared. Given this infrastructure, the technologies
will be developed.

4.a. I don't really know how to answer this question, as I don't
understand it. If the NSF wants true distributed partnerships the
networking is as important as the large platforms. We all appear
to agree that the partnerships ahve the potential to be even more
successful and dynamic in advancing the progress of science and
engineering, and further they are the right model to influence
the directions thease communities must take to reap the bounty
of future technologies. Given that, I believe there is no choice
and the NSF and perhaps other agencies must find the wherewithal to
fund this activity properly. Perhaps these resources could be found
and their deployment overseen by networking and application researchers
at the partner sites (not the partner networking organizations).

b. I believe I answered this above.

 

Dick Crutcher

1.a A program that would award funding or other considerations based on competitively reviewed proposals for innovative and developmental use of high performance backbones would help. In the case of funding, it would undoubtedly be true that some connections that would be essential for development work by PACI groups would not be available or would have to be paid for. Having a mechanism for securing such funding would be essential to the success of the PACI programs that depend on high speed networks. Another aspect would be that some programs might require very high speed access, but for very limited periods of time. A process by which such access could be scheduled, so that other users were routed to slower network connections, might enable the R&D work of the PACI teams to proceed. Another similar example would be a mechanism to insure quality of service for R&D work that required it.

2.a (1) Extension of high speed networks to PACI partner sites. The vBNS connected a relatively small number of sites and apparently worked well if everyone involved in a project were so connected. In my case, which involves remotely located instruments, vBNS connections were not available, and there was no large organization with deep pockets that could provide the cost sharing funds needed for a vBNS proposal.

(2) Quality of service. Although distributed computing or remote instrument control would not generally need very high speed network connections continuously, it would be essential to have guaranteed access for R&D projects for relatively short periods, that could be scheduled in advance, at minimum stated rates.
&D
b. The NSF might be in a position to provide necessary funding that would enable the above. Funding for extension of high speed networks to sites off the beaten paths would be especially important.

c. SLAs obviously could be a mechanism for obtained the quality of service guarantees that would be necessary.

2. a Universities tend to look at things from three points of view: (1) spending their own money to enable the work of a large number of people, such as students; (2) providing seed money for a small group of people that might generate large grants in the future; or (3) funding something of very high visibility and prestige that will bring a bright and favorable spotlight. Only the last two seem applicable. As seed money, the project itself must involve science that looks like it might attract funding from the individual discipline if a trial or an R&D effort is successful. Some PACI projects might qualify. Probably the more promising is selling the idea that leading edge research on development of the next generations of computing applications that would depend on future very high performance networks would attract the limelight and be worth financial support.

b. The best way would be to set up an infrastructure such that there could be limited periods of guaranteed very high bandwidth service for bursts of R&D work, with the "excess" bandwidth" being available to the general community when not being so used.

c. I know of no plans for this.

3.a I am working to support remotely located radio astronomy synthesis array radio telescopes. What we would like to do is transmit the data in real time (a fairly low bandwidth, continuous application) to a supercomputer center, have the data processed into first-order images in near real time, and have users located anywhere in the country be able to visualize the (usually) 3D data sets and to change the observation based on what is being seen. Later, more sophisticated data processing will take place. The need is for the results to be available to users not located at the supercomputer center for real-time steering of the compute-intensive calibration, deconvolution, mosaicing, visualization, and scientific analysis. Astronomers generally do this now on local workstations, which require months of time to process hours of data. So the vision of the future is for astronomers to have telescope data available at supercomputer centers in near real-time, to process those data in times comparable to data acquisition times with the use of remote supercomputers being largely transparent (virtual co-processors in desktop workstations), to bring in other data sets from distributed archives, to get all the results back on their workstations for analysis, and to work collaboratively with colleagues at other sites. Such a seamless networked telescope/computer/archive system would significantly improve the efficiency and productivity of telescopes and astronomers.

b. The network infrastructure is the major missing part of the system that we have no ability to acquire or build on our own.

4.a The vision is GRID computing. If we cannot effectively use distributed resources and parallel computing systems, that vision becomes myopic at best. Using Alliance and NPACI clout to help obtain needed network resources is essential. Unless the remote supercomputers can appear largely transparently as super co-processors in users' workstations, it will not be worth the trouble for most people to use remote supercomputers. Hence, the MAIN responsibility of the Alliance and NPACI must be to enable their vision. If (as is usually the case) the bottleneck is network access, the Alliance and NPACI should take a pro-active role in solving that problem, not just shrug shoulders and say too bad. If (as is often the case) the WALL clock time to run a job is comparable to the wall clock time of running on a departmental-level system, a high Mflop rating when the job is run is irrelevant. Resource scheduling so users actually see supercomputer WALL CLOCK speeds should be a main responsibility of the Alliance and NPACI.

b. Prority 1 should be effectively scheduling so real (WALL clock) supercomputer speeds can be achieved. This does not require significant funding. One way to help achieve this would be to keep workstation class jobs off the systems. Another might be to set up wall clock schedules, so user A would know that he would have the resources he needs between (say) 4-8 pm.

Priority 2 should be working toward improving network access for Alliance and NPACI researchers. This is equally important as 1, and the Alliance and NPACI should use their clout to help. But ultimately I think the NSF simply has to recognize the importance of this and fund it.

Tom DeFanti

1.a The PACI program serves computational scientists and engineers by advancing the deployment of National-scale computing and communication services. The success of PACI program grantees is measured by how well scientist and engineers use these technologies. However, both the current vBNS and the Abilene effort have been serving the needs of campus CIOs as an expedient means to rapidly install interoperable, nearly universal high-performance networking between R1 institutions. The vBNS program has been extremely helpful in creating a market for high-bandwidth communications in that it shared the costs of local loops and backbone attachment with universities. ANIR should now focus on awards to scientists and engineers who use the infrastructure provided by the NSF with the clear indication that these awards help pay for recurring network costs. These awards should require a senior academic scientist as PI, with colleagues and the campus CIO as co-PIs. Clearly, high-bandwidth use of the networks is a key criterion, whether it be for collaboration, large-scale data access, distributed computing, or education. Proposals that include a component of using advanced/emerging network technologies (like security and Quality of Service) or advanced Grid techniques (like multi-site parallel computing, visualization/virtual reality, etc.) should be encouraged. Large cooperative agreements with PACI leading edge sites and partners could specifically focus on Grid services development and deployment, with smaller grants to research universities helping with the costs of networking. As with most NSF infrastructure programs, substantial cost sharing would be a requirement.

b. Interoperability of networks (IP/ATM and IP/Sonet), deployment and support of Grid services, interoperability with other Agency and foreign networks, advancing QoS and multicast.

c. ANIR should lead the Federal Agencies and international networks by funding applications that use the technology as well as the technology itself. A leadership role in training and outreach for users of the technologies would be an appropriate goal for NSF's support as well

d. SLA are needed to enable advanced users to tune applications performance.
The networking community needs to work with scientists and engineers to create the tools they can use rather than hide this information. NSF really needs to
help bridge the gap between computational science and network engineering, just as
it has encouraged computational scientists and computer scientists to
collaborate.

2.a Essentially the same methods used to justify high-performance computing for leading computational scientists: the awarding of competitive proposals (not more than one to a university) to cost-share the networking costs, particularly favoring those using advanced services coupled with peer-reviewed science and engineering applications.

b. The high-performance networks depend on a profile of highly bursty traffic to avoid melt down. What should be encouraged is more applications, not high sustained utilization.

c. Cost-recovery is likely; my recommendation is to make it a competitive way of life by assuming, as we consider the cost of computing equipment, that network access is essential to science and engineering. Commodity traffic should be provided by the university, of course, but high-performance applications should at least partially pay the extra costs through grants.

3.a Right now, my discipline (shared virtual worlds) is desperately in need of guarantees of bandwidth and latency. It will not become a practical technique for collaborative scientific discovery without QoS and operating syste m enhancements that take advantage of multiple flows with differing characteristics (such as unicast/multicast, secure, lossy/lossless, priority, and so on).

b. Supporting the Grid development and deployment across agencies (and internationally) is the most effective leveraging scenario, in my opinion. As a large scale effort, this will help drive the technology development and implementation, lead to quick adoption of standards, and encourage outreach activities, leveraging off and integrating with the existing structures already supporting large scale computing, visualization and data access.

4.a Alliance and NPACI should develop and deploy the Grid (cooperating with DARPA, NASA, DoE).

b. PITAC addressed these issues; its recommendations should be followed and implemented.

Andrew Grimshaw

1.a Middleware research efforts will always need to be focused on the next generation of capability so that the software can be ready when the infrastructure become common-place. Therefore the middleware research community (i.e., metasystems researchers) will need access to the highest bandwidth systems. Further, these high-speed networks will need to be connected to an interesting set of resources, e.g., the SC centers, large scale clusters, interesting databases, instruments, etc.

b. Human infrastructure. The physical infrastructure is in pretty good shape (of course it can always be improved), but one of the most difficult challenges today is finding first class programmers and researchers that are intimate with Unix, networking, etc. AND then paying them with university salaries. These people are the same ones that industry wants most. As an annectodal example, I just lost one of my best systems guys to a beltway bandit - they are paying him TWICE what I was paying him - or could pay him. Indeed he is now making more than all but the most senior faculty in my department. Similarly, SDSC has been loosing system staff to industry, and now has almost a dozen open slots.

Another critical issue is service fungibility - to make metacomputing work most effectively requires that we move tasks from place to place. Current policy is that a user has unit allocations on a particular machine. We cannot "trade" the allocations around the system. By setting up trading, and pricing resources based on their value to the community (i.e., using 32 nodes on a 1000 node machine would "cost" more than 32 nodes on a 64 node machine -- discouraging small jobs on the large resource) we could improve average response time. (Classic M/M/k queue vs. k independent M/M/1 queues.)

c. From the human resources side it is hard to see what you could do. At UVA we have considered settting up an independent unit to hire the folks so that we could side-step the universities salary limits. To do that though means that the funding agencies (e.g., NSF) recognize that good people cannot be had for $55K - that they are going to cost $80-$110K. It is a lot of money, but except for the dedicated few willing to make their families suffer we cannot get the people we need.

d. I think that they are critical for the networks. If the telecoms do not have contractual obligations they will provide the minimum they can.

As to the compute and data resources, see my comment above about the need to be able to trade resources.

2.a My university will not pick up the cost. We are a state institution. The CIO will not be able to justify bandwidth in excess of what the university community as a whole needs. They have matched so far because it was matching, and there were several other researchers who needed improved bandwidth.

b. I think that the characterization above will change over time. As we move from stunt applications to an integrated environment there will be much more "background" traffic as the underlying metasystem moves binaries around, copies data files, provides remote data access etc. I think the "low bandwidth utilization" is a result of the software to make the metasystem easy to use not being ready.

c. We are not moving that way. Nor can I forsee a time when that could possibly happen. If it did, then poor departments (e.g., English, History) would get lower quality service. At UVA those are the powerful departments.

3.a The environment for high performance scientific computing will change significantly over the next five to ten years. Today users are very aware of where physically their programs are executing, where their data resides, and when a processor fails. In the not-to-distant future that will not be the case. Instead users will manipulate objects, applications, data files, instruments, ongoing simulations, etc.. The underlying system will manage scheduling, accounting, data and binary migration, security, and fault-tolerance in a heterogeneous, physically distributed computational environment composed of physical resources controlled by multiple administrative units.

The state of the art

Today users typically must decide where to execute their applications. They can choose to run it on their workstation (if it will fit), on a local high performance machine, e.g., at a university or lab-wide computing center, or they can execute at a supercomputer center. The choice involves many trade-offs. First there is the scheduling issue, which choice will result in the fastest turn around? This is particularly acute when the user may have accounts at multiple centers. How is she to know which is likely to result in the best performance without manually checking each. Then there are the inconveniences of using remote resources. The data may first need to be physically copied to a remote center, and the results similarly copied back for analysis and display. This data copying may be further complicated by the need to synchronize with collaborators to have them copy their data as well. Then there are the administrative difficulties of acquiring and maintaining multiple accounts at multiple sites.

The net result is that users today spend a great deal of time manually structuring and managing their computations. In effect today's computational scientists must become a high-performance computer expert in order to achieve good performance on their codes. Rather than trying to understand the science of their application they are spending their time trying to outguess the computer.

The Future

The resources contained in future metasystems will include compute engines (from high-end supercomputers to workstations and personal computers), digital libraries (ranging from traditional text-based document collections to very large scientific databases), physical devices and specialized instruments (e.g., telescopes, microscopes, machine tools, satellite imaging systems), and high-speed communication links. The resources will be owned by a many different individuals and organizations each with different objectives, and few of them altruistic enough to just give away their resource.

Metasystems of the future will support the illusion of a single virtual machine to users, a virtual machine that provides secure shared object and shared name spaces, fault tolerance, improved response time, and greater throughput.

The potential benefits of a successful metasystem are enormous and include: (1) more effective collaboration by putting collaborators in the same virtual workplace; (2) higher application performance due to parallel execution and exploitation of off-site resources; (3) improved access to data and computational resources; (4) improved researcher and user productivity resulting from more effective collaboration and better application performance; (5) increased resource utilization; and (6) a considerably simpler programming environment for the applications programmers.

b. Human resources.

4.a I think that this is a middleware problem. I think that Legion and Globus will alliviate this problem by simplifying the use of the distributed resources. What is needed from the centers is support for the middleware software (people on the ground), and training of the user community in the use of the middleware.

b. People on the ground at the centers working on the middleware as partners with the development teams.

I don't know where the money should come from. Let me state though that if the metasystems projects fail (in the sense that they do not significantly lower the barriers to use of the distributed resources) then the very high speed networks will only be used by a few very dedicated scientists .... and in a sense the networks will fail too.

 

Ron Hutchins

1.a I don't believe that the networking requirements of advanced computing will be met with off the shelf commercial services in the near term, at least not in a cost effective manner. It will require a collaboration, as in the past, between research and education networking and the research computing areas to build the critical mass to support a network infrastructure providing both high performance production services and networking research. Both of these areas will benefit from federal funding sources. But the funding must be used additively... By this I mean that funding must be coordinated to create a critical mass that can drive prices down and performance up.

b. The two extremes of networking are both problems: ad-hoc point to point circuits may provide performance and latency control, but cost and reuse are both down sides of this model; latency across a general purpose network may be high since hop count may not be controllable...

c. NSF must lead with a vision that encompasses both high end computing and high end network, both the research components and the production components. There are models that support both (see Bob Aiken's MORPHNET model from the DoE). If NSF can create a vision that supports both of these while moving from a backbone investment closer to the edges of the service (for example, funding applications but providing guidelines and certifications for services...) this vision will create the model that supports the critical mass...

d. This may be a hard question... There are so many conflicting requirements in this field. Latency is hard... QoS is hard today, unless one overbuilds and dedicates services. This is changing but it will be a while before it's done... We have succeeded in the past and present by overbuilding and heavily managing. the vBNS is a model of this... The Abilene network is repeating this model. There are other models that may work as well... More on this later...

2.a This is very hard... I think the only way to garner continued support is to create a business model that allows a university to put more services into this high speed bucket and take money from other pots to fund these initiatives, while not giving up services... It's this leveraged service model that I support for the future. The universities are seeing a huge increase in use of ISP services and costs for these are tremendous... The quality of these services is not good... Our core business is not being served as well as it should be by these services for these dollars. If we can use the new initiative and critical mass to create an "Intranet for education and research" across the country, moving all ISP type traffic onto virtual networks supported by the same low level services as the research networks, we win... If we can, in the future, funnel some part of our voice services, especially that which is between our partnering universities over this same infrastructure, if we can utilize the network that we're building to bring competition in commodity network services to underserved areas, we win... If we can eventually utilize the same infrastructure, a very robust and redundant and high performance low level infrastructure to create the virtual networks that we need for high performance computing and network research along side of these commodity services, building competition and pushing commoditization, we all win big.

b. See the above diatribe... I think this is the major reason to combine these services over the low level infrastructure... For example, ATM allows one to utilize VBR services to provide guaranteed bandwidth when necessary and use UBR services on the same pipe but a different virtual circuit to fill the pipe when underutilized with commodity or at least other non-priority traffic... This is the win... MPLS may in the future provide these same services over an IP network...

c. I think we will eventually have to do this, but when you charge early on for services like this, you dis-incent the users of the services... What we need is to get the win by combining services and do flat rate pricing in the early stages of the high performance networking and computing, similar to what we did with internet... Today we're paying by the packet, basically and we're all struggling with how to cut back, not how to innovate...

3.a This could be great or it could be bad... If the applications and middleware builders are strongly encouraged to combine their resources, say regionally, and support networks that can do the above leveraging, then we all win... If the apps and middleware folks can put up point to point network links that serve only their app and go away when the project is done, we all lose. We don't get the benefit of driving competition and the commercial markets to where we want to be and miss out... When the NSF net was sold, education and research were 95% of the users... After only a couple of years, education and research were less than 5% and now shrinking further... Who gets heard by the commercials? the commercials, not educaton and research. Now we have the attention of some of the larger commercial entities since they are seeing the need for more creative energy from the ed/research side to get them the next steps... We had better leverage that and build something that we can keep for the future... This is the basic idea behind Internet2, though I think the current implementation of I2 needs some modification in order to succeed ultimately...

b. Of course, the QoS research that's going on will be necessary... The continued convergence between ATM and IP that's happening is a good thing... Security is a most necessary part of all this... We've ignored it too long at the expence of our campuses today... Security incidents are becoming the most time consuming part of our jobs...

I also think that we need to pursue cross admin domain functionality. The vBNS and Abilene don't really address this issue, especially where ATM is concerned and signalling across these infrastructures... AT&T has stated that they will not pass signalling to any other network providers. This is a safe thing to do, but it's not the way to get us into the future... Middleware for collaborations is another key element that's missing today... Until we get this we'll have to travel and that's expensive...

4.a As I said above, I think building a general infrastructure that can be leveraged to quickly assemble experimental testbeds and infrastructure to support research is necessary... Researchers generally don't want to wait for 60-90 days and deal with phone company procedures to get circuits up to their peer institutions... What the PI's need is some general understanding of the overall area... Most folks want to do the right thing if they understand that it will not hinder what they are doing and will help others. This is hard sometimes to get across... If NSF awards do not require PIs to put their connectivity dollars into a generalized infrastructure for the common good of research (and maybe save some dollars too in the process), some will still do this, and some will not... Education is probably the most effective way to get consistency in this matter...

b. Priorities in my view are to build a vision that will take us into the future, leveraging research, education, and commodity services to create the critical mass to build the networks that we need, sharing these to solve our service needs. They can be financed by paying for networking via awards to applications and middleware but this connectivity must be done in such a way that this leveraging is done, not individual services that are short lived... Hope this makes some sense in light of the previous discussion...

Sid Karin

1.a PACI is a NSF-wide program and must continue to have strong ANIR support. The perception has been that NSF/ANIR covers backbone connections and this is sufficient for the Partnerships. Perceptions are beginning to change however, as witnessed by the results of the Site Review in October, the NPACI Program Plan's Appendix B (Networking), and the joint Alliance/NPACI Proposal to NSF entitled, "Building and Applying a National-Scale Grid: A PACI-wide Next Generation Internet Testbed".

b. An important issue that needs to be addressed would be guaranteed and reserved bandwidth, as well as, bandwidth usage coordinated with resource usage. For example, if an astronomer had a limited amount of time reserved on a supercomputer and a telescope in order to do his/her observations, and the area he was to observe would be dependent upon certain observable data, bandwidth availability would be crucial. With guaranteed/reserved bandwidth, the astronomer could analyze incoming data in order to accurately choose his/her coordinates, and continue his/her experimentation in real-time. A lapse in communication due to lack of bandwidth could cause the researcher to waste time on irrelevant and perhaps useless data in regards to their specific research.

c. PACI is a NSF-wide program and must continue to have strong ANIR support. Ultimately, it is the NSF's responsibility to support the research that the private sector will not. In regards to the user community, i.e. industry and researchers, the NSF may not play a major role, but the commercial world does not support network researchers. In other words, network researchers worry about performance while the private sector worries about price over performance. Therefore, for network researchers, the NSF is the main source to look towards for funding for this type of research.

d. N/A

2.a Campuses maintain strict control over their campus infrastructures and the network connections to those campuses. PACI should serve as a resource to assist these campuses to better meet the need of PIs participation in PACI. Relationships need to be cultivated with the CIOs on campuses to assist them in better understanding the benefits of PACI partnership and of upgrading the campus infrastructure to levels enabling PIs to utilize high performance networks. Beyond PACI serving as a broker, this is an issue of commodity vs. high-end experimental networking that will be funded by the private sector or government agencies, respectively.

b. Use and publicize appropriate metrics, do not promulgate the 24 hour bandwidth usage.

c. Yes, institutions will move toward cost-recovery, and high-end experimental networks will be left 'holding the bag', in other words, looking for funding from other sources such as the NSF.

3.a Five years from now, people who use the high performance networks will be divided into two sub-groups, network researchers and researchers who use the high performance network. The first group's intellectual curiosity will be to study the actual workings of the high performance network, i.e. routers, switches, backbone, bandwidth, and speeds of packet transfers. The second group will merely be using the high performance network to satisfy their intellectual curiosity that focuses on disciplines including but not limited to, computer science, astronomy, biology, marine science and chemistry. The experiments that both groups will be working on will change over the next five years; some of which we cannot even begin to imagine, today.

b. In five years, the network will possess many differences including greater bandwidth and features that do not currently exist. These features might include 'quality of service' technologies, reservations for bandwidth, guaranteed bandwidth, and end-to-end security.

4.a Alliance and NPACI responsibilities would be to serve as a broker for PIs, to set performance levels, to provide support and education analogous to the Scientific Computing Group, and to develop a form of certifying the performance of partners.

b. This is also an issue of commodity vs. high-end experimental networks. Each will be financed from separate sources.

Ken Klingenstein

1. The primary barrier to better utilization of advanced distributed resources is the lack of ubiquitous and interoperable middleware. While there is much discussion about what is in middleware, a rough hierarchy can be identified, with underware such as identity, multicast and DNS, basic middleware such as authentication, directories and authorization, and advanced middleware such as shared state for collaboration, coscheduling of distributed resources, etc. (The hierarchy is based upon dependencies.) Underware, with its close association with local physical infrastructure, can be assumed to be a campus-provided service. (Obviously there are significant off-campus issues in each of these areas.)

It is essential that the basic middleware for distributed research groups work within the context of the primary institutions that researchers belong to. Such utilities as authentication and directories for PACI researchers ought to be consistent, in approach and user interface if not in implementation, with their campus utilities. There needs to be an alignment between the critical if pedestrian needs of campuses for such middleware and the activities of PACI.

Advanced middleware is not likely to be developed on campuses, given both the press of business and its lack of immediate relevance to broad campus needs. Thus advanced middleware is appropriate grist for PACI and other advanced computing and collaborative organizations. In the development of this upper middleware, care should be taken to build upon the campus-based basic middleware.

NSF could have two vital roles in these developments: to nurture the development of basic middleware on campuses and to directly support the creation of the advanced middleware needed for distributed research activities. Nurturing the development of basic middleware on campuses must recognize that the challenges are not only technical, but also in design and policy. Thus a three-pronged approach, proceeding in parallel, is needed to create or shape (in the case of commercial product development that may not meet the unique needs of academia) basic technologies, to identify effective campus design strategies (such as directory schema, security zones, etc.) and to push the policy stakeholders on campus (such as registrars and legal counsels) to resolve the complexities that FERPA, Open Records, etc. introduces.

SLA's will have important but limited roles. It will be important for campuses to validate performance characteristics from the campus egress to remote resources. On campus, however, a variety of ad hoc approaches to network performance (using primarily overprovisioning) will likely mean that SLA's will not be used locally.

2. It is worth some deeper analysis to see how the network traffic patterns of researchers fit within broader campus patterns. On larger campuses, assuming 6 mbps per average desktop times 20,000 ports is 120 Gbps. In this perspective, researcher use spikes may be less significant than in today's traffic. The moral may be to raise the rest of campus usage so that research use is less conspicuous.

But if it is the case that research network traffic does indeed require consequential funds, then the next step is to examine the location and type of the costs. If the cost is at the egress (say in purchase of premium traffic service) then a set of yet-unknown issues (eg does the premium service also improve the regular service, is it priced per use, etc.) will determine institutional policies. There is not likely to be sophisticated QoS charging schemes until management tools improved greatly. The costs that are in the campus infrastructure (for example running gigabit ether to a desktop from a special backbone), are likely one-time and can be promoted through traditional academic processes.

3. It is worth considering what the ubiquitous campus computing environment will be like in five years and what infrastructure is needed for that. Identity and authentication services will need to be deployed to support out-sourced modem pools, library database accesses, and on-line student transactions. Authorization will be needed to drive workflow systems and manage transaction systems. QoS software will be used to meter usage of external resources for distance education. Directories will enable students to sit down at any keyboard in a public computing lab and have their bookmarks and aliases readily available. Distributed file systems will put their "locker" at their desktop. Administrative applications will be driven by centralized middleware and, in turn, feed those repositories with automated feeds and updates.

It is important to note that the services above must be multiplatform, not only in the clients that they serve, but in the core servers that operate the services. For example, identity for users must span the traditional Unix orientation to serve for modem pools and for NT servers. The services must have applications that use them in almost all areas; for example the lack of a secure ftp could compromise the security of all other apps, since users tend to use the same password in secure and insecure environments. The services must also have interoperability with each other and with different campuses. It would be useful to have public domain implementations of key services and applications.

4. The Alliance and NPACI are in a difficult position, hampered in an area that they have little opportunity to remedy. Moreover it is frustrating that the technical challenges are often overwhelmed by issues of policy and scaling.

The Alliance and NPACI should focus on creating the specialized superstructures necessary to support advanced applications and seek to influence campus development of the basic middleware services. It should facilitate in the designation of top-level services providers (for certificates, for directories, for name service, etc.) in the key areas where Alliance and NPACI customers are located (.edu, .gov, etc.) It should also work for technology transfer mechanisms in areas specific to advanced networking and computing, such as tuning high-performance workstations, effective use of QoS, etc.

Bill Lennon

1. Commodity access will always lag the demand of high end users whose very demands shape the emerging generation of routine applications. A mix of unique and commodity resources will be required. Incrementally lower costed services won't drop prices fast enough to satisfy the needs. ANIR must focus on joint ventures between industry, university and government research labs. It is the only way to leverage the best of each domain and get the cost effective deployment of broadband network resources. It is further the fastest way to migrate good ideas into volume, commodity deployment. Unfortunately, in general, legislative relief from the Federal Acquisition Regulations may be required to foster the collaboration.

2. When NSF/ANIR phases out it direct support of high performance connections, the funding increases to individual PI's must compensate, and, a competitive / collaborative infrastructure must exist to guarantee that the resources are available to compete for the moved "network dollars."

Again, I feel that the availability of anything beyond incremental price / performance improvements will be dependent upon a government/ industry partnership in which government shred the risk for initial, far reaching technology platforms upon which the ideas needed in five to ten years are tested in the bright light of a broadly available, large fast infrastructure that complements the incremental price/performance advances of commodity services.

3. Visionaries succeed in articulating a vision. They promulgate the vision widely and engage the R&D community in implementing its components. Success is achieved when the implementations are "ubiquitous" -- high volume production and use with easily mastered human interfaces. A capability is ubiquitous when you notice it's missing rather than it's being present.

Political decisions will effect resource siting. Independently of that, the resources that will be clustered at advanced computing sites will reflect the current technology about to be deployed in 'volume' production. In five years the computer science research community will be using the tools produced in volume as a result of today's visions, implementing the current vision and formulating the next. The implementations will display the continued cycles of concentration and distribution during the evolution.

Central to this flow of computing and storage between central sites and remote sites are the networks -- the bigger and faster they are, the more rapid the cyclic evolution in which new capabilities are alternately centralized or distributed depending upon the current technology. Central to deployment of the networks is the application driven business case required to justify funding the infrastructure. It will necessarily lag the computer science R&D but both communities will benefit from promulgating needs and capabilities into each others' domains.

In my vision, higher and higher speed networks eliminate the need for workers to be tethered to any given site to access either colleagues or resources. Further, the business community has recognized the value of such collaborations and willingly participates and contributes resources since they believe it is in their best interests to do so.

4. The only resource that consistently requires centralization is expertise. Further, there isn't enough to go around. I believe that every effort should be made to centralize training, education and support -- using network technology to distribute that 'central' function so that the "sun never sets" on the "help desk." The tools are just now emerging (but not yet "ubiquitous") to implement the "distributed center." I'd invest in fixing the human interface to that capability to eliminate as many impediments between a scientist or student and his or her resources as possible.

Tracie Monk

1.a-c NSF's financial support of the actual transit pipes should continue for key resource centers and sites where cutting edge research requires extensive bandwidth. But equally important, NSF should now take steps to extend its assistance (and equivalent expectations) to end-to-end support of the advanced networking infrastructure. These end-to-end requirements range from:

Addressing these points will help to ensure that community resources (particularly NSF-sponsored PACI resources) are accessible and usable by distributed researchers. It will also require that NSF/ANIR expand its traditional definition of infrastructure support (currently synonamous with transit/backbone support) to include enabling technologies and services and end-to -end networking.

d. In terms of Service Level Agreements, large commercial customers are beginning to insist on SLAs with their providers and are establishing means of monitoring or verifying that they are receiving the promised levels of service. Academia will probably follow suite soon. The question is where a university's SLA (for its commercial Internet service and/or R&E; network service) will be determined solely by the office of the CIO. If PACI researchers and resource sites have measureable service level requirements, then they need to articulate these requirements and ensure that these requirements are addressed during provisioning of the campus infrastructure and negotiations network providers.

2. While NSF support for the advanced networking requirements of some researchers and support for the community's access of critical resources should continue, it is important that Universities start the process of redefining how they approach their networking costs. Some suggestions include:

3. The pace of technological advancement in the Internet is continuing to accelerate, making networking research increasingly critical and increasingly difficult. Any aids with respect to middleware and related tools are very important. Middleware-ish technologies like caching and information retrieval aids are also vital to users being able to easily locate and retrieve information. Another area of great importance to our group (CAIDA), and to other distributed/collaborative organizations, would be more effective collaboratory tools (interactive audio/visual/data capabilities) and more intuitive interfaces to important technologies such as multicast.

4. PACI (NSF, NPACI and Alliance) have a responsibility for making the partnerships work. Networking should be viewed as the vital link connecting PACI's disparate parts. Direct support of all partnership participants from desktop tools, to applications-design tools and support for researchers, to LAN and WAN support for researchers (e.g., end-to-end high performance h/w, s/w, and connection -- as well as technical support to ensure it works), to education and feedback for an emerging community of users of advanced networks and of developers of applications to run on those networks.

 

Peter Taylor

1. I am unhappy with the primary assumption here. Would not the greatest contribution ANIR could make be to support the establishment and operation of one or more high-speed networks designed to provide just these services (metacomputing, for instance)? What other infrastructure could ANIR provide that would help in this regard?

The most critical end-to-end issues (accepting then the assumption that high-speed long-haul traffic will be carried commercially) will undobtedly be the usual last-mile problems of connections to and within campuses. This where where ANIR needs to work with local campus networking in order to ensure that the access needed is provided. As far as NSF's responsibilities are concerned, I think there is a perception problem here that, for instance, the PACI program is an ACIR program: fundamentally, it is an NSF program that affects essentially every office within the Foundation. Once this is recognized, it becomes clear that a national networking program is an NSF program, not merely an ANIR program. In this sense NSF should have the responsibility of ensuring that the networking infrastructure can support the other programs (e.g., metacomputing within PACI) as required. I am not qualified to comment on SLAs.

2. As noted above I have very mixed feelings about the basic premise here. Accepting it for the purpose of discussion, the answer to the first subquestion is straightforward: the usual methods. That is, the more politically skilled or glib groups will be more successful than the less glib. I think it is unlikely that discipline-specific funding could be used for this sort of thing: that is, I doubt that one could persuade, say, NSF's Chemistry Division to pay this for computational chemistry activities.

The issue of how to justify the use of such networks in a "bursty" mode is to point to the quality of the science etc being performed. This is not dissimilar to having to justify idle time on a large computer generated by reserving most or all of the machne for a specific large calculation. In that case the research has undergone a peer-review process that indicates its worth, and this process can factor in the consequences such as idle time or delaying other users. I can only speculate as to what institutions will do, but it seems likely that campus networking directors will turn to whatever is needed to balance their budgets. Cost-recovery is a pretty easy approach from their point of view --- it is rarely successful in getting the best science done and is sometimes strongly counterproductive, however.

3. Our (quantum chemistry's) main benefit would be the ability to run metacomputing calculations that could take advantage of computational resources at many sites. This would increase the total computing power available to us to perhaps 100 TFLOPS, assuming that ASCI-class sites were accessible as part of this. For methods with good scaling, this could mean an order-of-magnitude increase in the size of systems that could be studied. The benefits would be priceless. The main requirements are metacomputing software and the networks!

4. This is completely bound up with my response to Q1. If NSF shoulders its responsibilities and supports the basic infrastructure, the PACI partnerships can do much of the rest, especially the education and training component. If NSF will not shoulder its responsibilities, the partnerships obviously can do little more than help the users cope with an inadequate infrastructure.

Doug Toussaint

1.a I'm not sure how to answer this. Clearly the centers need to keep in close touch with the research groups to identify problems, which may not be the problems that are expected.

1.b The chief concern of most computational scientists is high bandwidth data transfers and interactive access to remote sites without undue delays. Reliability, in the sense of always being able to make such transfers, is important. Simplicity of use and uniformity of interfaces across all the machines in use are important. Security is an important issue -- most of us have lost research time while various machines were fixed after being hacked.

1.c I am personally skeptical about commercial carriers' ability to provide this level of service, or their interest in doing so. I suspect that NSF could make the scientific community more productive by continuing to play an active role in "infrastructure" type networks, in addition to "research about networks".

2.a The best way to handle this is to emphasize the wide range of usage levels, extending from the few groups that regularly require real time video on down to the many people requiring smaller resources, such as netscape connections to the preprint servers. This has implications for the design of the network --- a network serving only a small number of users is unlikely to get support.

2.b I can't help you on this one!

2.c I suspect there will be such moves. The most recent proposal at my institution was for a monthly charge per IP address.

3.a Most of the computation intensive applications in my discipline (theoretical physics) are sensitive to latency times, and so are unlikely to be done on widely distributed systems. What should be in place in five years is an environment where codes and data can be conveniently moved to the available computing engines. Of course, this is possible now; the key is to develop the software and hardware infrastructure to make it easy to do.

4.a I would hope that Alliance and NPACI would play important roles in all aspects of developing and operating networks for computational scientists. One aspect which is not touched on elsewhere in these questions is that Alliance and NPACI are particularly well placed to evaluate new products, make them easily usable, provide documentation, distribute them to researchers, and encourage their adoption. The classic historical example of this is Mosaic. More recently, I downloaded and installed the secure shell package from SDSC.


for more information:   info @ caida.org last update:   Mar 3 20:30:47 1999
this page: http://www.caida.org/PACI/index.html