
Questions:
1. Assuming that connections to high performance Internet backbones
evolve into a commodity service (to be purchased by universities
fro m various commercial providers, gigaPoPs, or Abilene):
2. Assuming that NSF/ANIR phases out its direct support of high performance connections:
3. Assuming increased funding supporting applications and middleware:
4. Assuming that a lack of network engineering and network equipment
resources is inhibiting SDSC's and NCSA's abilities to fully support
the end-to-end requirements of partners, e.g. with respect to
distributed applications and distributed computing:
Andrew A. Chien
1.a. The primary challenges facing folks who wish to prototype
grids and develop application-enabling middleware is the availability
of bursty high performance, guaranteed high capacity with reservations,
and the availability of high speed onramps. These limitations
not only inhibit the development of a large number of significant
novel wide-area science and engineering applications, they also
inhibit
the development of novel applications for future generations of
high
performance network services. Its unclear if the commercial services
can provide the high bandiwidth, reservable communication capacity
*required* to perform scientific and controlled experiments and
to
prototype both application and systems for the future.
b. High speed on ramps for partners to WANs, high speed paths
to
other partner sites and leading edge sites. Lack of stable funding
to support development of these infrastructures both to support
the PACI's efforts to become Grids and to become high performance
Grid testbeds.
c. I would hope the NSF would take the lead in supporting the
infrastructure to make the above science, engineering, and grid
research possible. the bursty, high bandwidth use of the high
end applications makes setting up occasional high bandwidth connections
with the commercial providers difficult economically (big $$$'s)
and
make the obstacle to experimentation far too high. If there were
a facilitated way that NSF could pool users and shared a fixed
capacity on a scheduled basis (batching a la the successful MOSIS
microelectronic fabrication program perhaps) that would be very
helpful. However for this to work well, we would need for this
to be low overhead to get access, connect almost everywhere, and
be subsidized to be quite inexpensive.
d. I'm not sure about SLA's, except to say that I don't know how
you can tell the service is satisfactory unless you have some
monitoring
and performance data and the only way to ensure that you can get
good service is to tie payment to the quality.
2. I don't like this assumption, as I believe it will lead to
a
a stall in high end network research. Future networks will be
even more predominantly hordes of 28.8K users. Not a bad thing
but we'll miss a golden opportunity with respect enabling high
performance distributed computing for science, engineering, and
other large scale computing challenges of national, global, or
even regional interest.
a. Universities in general are unlikely to be able to find the
money. They're strapped just trying to wire 10 and 100Mbps to
dorms, departments, and offices.
b. see my MOSIS comment above. Regarding low bandwidth utilization
concerns, on possibility is to allow low speed users on the network
but at a lower priority and allow them to be squeezed out when
high
bandwidth users need the bandwidth. thus, the network can be a
commuter lane which can be monopolized by high performance applications
in priority over the lower speed users. May need some modest routing
advances to make this easy.
c. already starting to happen. again this will be bad for high
end
networking research unless we can find a way to fund this. the
cost
of experimentation at the high end will explode (because we're
climbing and exponential in capability). this means that to experiment
for N years out, we may need 10 or 100x the bandwdith, and I don't
see how the PI's or institutions can pay 10x or 100x in $$$'s.
3.a. Easy to use high bandwith wide-area computation. Manipulation
of 10 terabyte data sets (yes terabytes) by ordinary users on
a routine
basis through high end graphical interfaces. In 5 years, a TB
of
storage will be approximately $2K. Globalized network
services and information with only modest concern for locality.
Unified data and service access around the country and world.
b. Widespread deployment of *experimental infrastructure* for
wide
area computing, storage, netowkring, and application experiments.
hese resources
need to be made network accessible, available with minimal overhead,
and available at minimal cost. Organizations like the current
PACI's
have difficulty doing this because their platforms cost so much.
Further, they are monitored for "system utilization". Clusters
are
the natural technology for this and recent advances in communication,
security, etc. make it possible to build versatile resources that
can be safely shared. Given this infrastructure, the technologies
will be developed.
4.a. I don't really know how to answer this question, as I don't
understand it. If the NSF wants true distributed partnerships
the
networking is as important as the large platforms. We all appear
to agree that the partnerships ahve the potential to be even more
successful and dynamic in advancing the progress of science and
engineering, and further they are the right model to influence
the directions thease communities must take to reap the bounty
of future technologies. Given that, I believe there is no choice
and the NSF and perhaps other agencies must find the wherewithal
to
fund this activity properly. Perhaps these resources could be
found
and their deployment overseen by networking and application researchers
at the partner sites (not the partner networking organizations).
b. I believe I answered this above.
1.a A program that would award funding or other considerations
based on competitively reviewed proposals for innovative and developmental
use of high performance backbones would help. In the case of funding,
it would undoubtedly be true that some connections that would
be essential for development work by PACI groups would not be
available or would have to be paid for. Having a mechanism for
securing such funding would be essential to the success of the
PACI programs that depend on high speed networks. Another aspect
would be that some programs might require very high speed access,
but for very limited periods of time. A process by which such
access could be scheduled, so that other users were routed to
slower network connections, might enable the R&D work of the PACI
teams to proceed. Another similar example would be a mechanism
to insure quality of service for R&D work that required it.
2.a (1) Extension of high speed networks to PACI partner sites.
The vBNS connected a relatively small number of sites and apparently
worked well if everyone involved in a project were so connected.
In my case, which involves remotely located instruments, vBNS
connections were not available, and there was no large organization
with deep pockets that could provide the cost sharing funds needed
for a vBNS proposal.
(2) Quality of service. Although distributed computing or remote
instrument control would not generally need very high speed network
connections continuously, it would be essential to have guaranteed
access for R&D projects for relatively short periods, that could
be scheduled in advance, at minimum stated rates.
&D
b. The NSF might be in a position to provide necessary funding
that would enable the above. Funding for extension of high speed
networks to sites off the beaten paths would be especially important.
c. SLAs obviously could be a mechanism for obtained the quality of service guarantees that would be necessary.
2. a Universities tend to look at things from three points of view: (1) spending their own money to enable the work of a large number of people, such as students; (2) providing seed money for a small group of people that might generate large grants in the future; or (3) funding something of very high visibility and prestige that will bring a bright and favorable spotlight. Only the last two seem applicable. As seed money, the project itself must involve science that looks like it might attract funding from the individual discipline if a trial or an R&D effort is successful. Some PACI projects might qualify. Probably the more promising is selling the idea that leading edge research on development of the next generations of computing applications that would depend on future very high performance networks would attract the limelight and be worth financial support.
b. The best way would be to set up an infrastructure such that there could be limited periods of guaranteed very high bandwidth service for bursts of R&D work, with the "excess" bandwidth" being available to the general community when not being so used.
c. I know of no plans for this.
3.a I am working to support remotely located radio astronomy synthesis array radio telescopes. What we would like to do is transmit the data in real time (a fairly low bandwidth, continuous application) to a supercomputer center, have the data processed into first-order images in near real time, and have users located anywhere in the country be able to visualize the (usually) 3D data sets and to change the observation based on what is being seen. Later, more sophisticated data processing will take place. The need is for the results to be available to users not located at the supercomputer center for real-time steering of the compute-intensive calibration, deconvolution, mosaicing, visualization, and scientific analysis. Astronomers generally do this now on local workstations, which require months of time to process hours of data. So the vision of the future is for astronomers to have telescope data available at supercomputer centers in near real-time, to process those data in times comparable to data acquisition times with the use of remote supercomputers being largely transparent (virtual co-processors in desktop workstations), to bring in other data sets from distributed archives, to get all the results back on their workstations for analysis, and to work collaboratively with colleagues at other sites. Such a seamless networked telescope/computer/archive system would significantly improve the efficiency and productivity of telescopes and astronomers.
b. The network infrastructure is the major missing part of the system that we have no ability to acquire or build on our own.
4.a The vision is GRID computing. If we cannot effectively use distributed resources and parallel computing systems, that vision becomes myopic at best. Using Alliance and NPACI clout to help obtain needed network resources is essential. Unless the remote supercomputers can appear largely transparently as super co-processors in users' workstations, it will not be worth the trouble for most people to use remote supercomputers. Hence, the MAIN responsibility of the Alliance and NPACI must be to enable their vision. If (as is usually the case) the bottleneck is network access, the Alliance and NPACI should take a pro-active role in solving that problem, not just shrug shoulders and say too bad. If (as is often the case) the WALL clock time to run a job is comparable to the wall clock time of running on a departmental-level system, a high Mflop rating when the job is run is irrelevant. Resource scheduling so users actually see supercomputer WALL CLOCK speeds should be a main responsibility of the Alliance and NPACI.
b. Prority 1 should be effectively scheduling so real (WALL clock)
supercomputer speeds can be achieved. This does not require significant
funding. One way to help achieve this would be to keep workstation
class jobs off the systems. Another might be to set up wall clock
schedules, so user A would know that he would have the resources
he needs between (say) 4-8 pm.
Priority 2 should be working toward improving network access for
Alliance and NPACI researchers. This is equally important as 1,
and the Alliance and NPACI should use their clout to help. But
ultimately I think the NSF simply has to recognize the importance
of this and fund it.
1.a The PACI program serves computational scientists and engineers
by advancing the deployment of National-scale computing and communication
services. The success of PACI program grantees is measured by
how well scientist and engineers use these technologies. However,
both the current vBNS and the Abilene effort have been serving
the needs of campus CIOs as an expedient means to rapidly install
interoperable, nearly universal high-performance networking between
R1 institutions. The vBNS program has been extremely helpful in
creating a market for high-bandwidth communications in that it
shared the costs of local loops and backbone attachment with universities.
ANIR should now focus on awards to scientists and engineers who
use the infrastructure provided by the NSF with the clear indication
that these awards help pay for recurring network costs. These
awards should require a senior academic scientist as PI, with
colleagues and the campus CIO as co-PIs. Clearly, high-bandwidth
use of the networks is a key criterion, whether it be for collaboration,
large-scale data access, distributed computing, or education.
Proposals that include a component of using advanced/emerging
network technologies (like security and Quality of Service) or
advanced Grid techniques (like multi-site parallel computing,
visualization/virtual reality, etc.) should be encouraged. Large
cooperative agreements with PACI leading edge sites and partners
could specifically focus on Grid services development and deployment,
with smaller grants to research universities helping with the
costs of networking. As with most NSF infrastructure programs,
substantial cost sharing would be a requirement.
b. Interoperability of networks (IP/ATM and IP/Sonet), deployment
and support of Grid services, interoperability with other Agency
and foreign networks, advancing QoS and multicast.
c. ANIR should lead the Federal Agencies and international networks
by funding applications that use the technology as well as the
technology itself. A leadership role in training and outreach
for users of the technologies would be an appropriate goal for
NSF's support as well
d. SLA are needed to enable advanced users to tune applications
performance.
The networking community needs to work with scientists and engineers
to create the tools they can use rather than hide this information.
NSF really needs to
help bridge the gap between computational science and network
engineering, just as
it has encouraged computational scientists and computer scientists
to
collaborate.
2.a Essentially the same methods used to justify high-performance
computing for leading computational scientists: the awarding of
competitive proposals (not more than one to a university) to cost-share
the networking costs, particularly favoring those using advanced
services coupled with peer-reviewed science and engineering applications.
b. The high-performance networks depend on a profile of highly
bursty traffic to avoid melt down. What should be encouraged is
more applications, not high sustained utilization.
c. Cost-recovery is likely; my recommendation is to make it a
competitive way of life by assuming, as we consider the cost of
computing equipment, that network access is essential to science
and engineering. Commodity traffic should be provided by the university,
of course, but high-performance applications should at least partially
pay the extra costs through grants.
3.a Right now, my discipline (shared virtual worlds) is desperately
in need of guarantees of bandwidth and latency. It will not become
a practical technique for collaborative scientific discovery without
QoS and operating syste m enhancements that take advantage of
multiple flows with differing characteristics (such as unicast/multicast,
secure, lossy/lossless, priority, and so on).
b. Supporting the Grid development and deployment across agencies
(and internationally) is the most effective leveraging scenario,
in my opinion. As a large scale effort, this will help drive the
technology development and implementation, lead to quick adoption
of standards, and encourage outreach activities, leveraging off
and integrating with the existing structures already supporting
large scale computing, visualization and data access.
4.a Alliance and NPACI should develop and deploy the Grid (cooperating
with DARPA, NASA, DoE).
b. PITAC addressed these issues; its recommendations should be
followed and implemented.
1.a Middleware research efforts will always need to be focused
on the next generation of capability so that the software can
be ready when the infrastructure become common-place. Therefore
the middleware research community (i.e., metasystems researchers)
will need access to the highest bandwidth systems. Further, these
high-speed networks will need to be connected to an interesting
set of resources, e.g., the SC centers, large scale clusters,
interesting databases, instruments, etc.
b. Human infrastructure. The physical infrastructure is in pretty
good shape (of course it can always be improved), but one of the
most difficult challenges today is finding first class programmers
and researchers that are intimate with Unix, networking, etc.
AND then paying them with university salaries. These people are
the same ones that industry wants most. As an annectodal example,
I just lost one of my best systems guys to a beltway bandit -
they are paying him TWICE what I was paying him - or could pay
him. Indeed he is now making more than all but the most senior
faculty in my department. Similarly, SDSC has been loosing system
staff to industry, and now has almost a dozen open slots.
Another critical issue is service fungibility - to make metacomputing
work most effectively requires that we move tasks from place to
place. Current policy is that a user has unit allocations on a
particular machine. We cannot "trade" the allocations around the
system. By setting up trading, and pricing resources based on
their value to the community (i.e., using 32 nodes on a 1000 node
machine would "cost" more than 32 nodes on a 64 node machine --
discouraging small jobs on the large resource) we could improve
average response time. (Classic M/M/k queue vs. k independent
M/M/1 queues.)
c. From the human resources side it is hard to see what you could
do. At UVA we have considered settting up an independent unit
to hire the folks so that we could side-step the universities
salary limits. To do that though means that the funding agencies
(e.g., NSF) recognize that good people cannot be had for $55K
- that they are going to cost $80-$110K. It is a lot of money,
but except for the dedicated few willing to make their families
suffer we cannot get the people we need.
d. I think that they are critical for the networks. If the telecoms
do not have contractual obligations they will provide the minimum
they can.
As to the compute and data resources, see my comment above about
the need to be able to trade resources.
2.a My university will not pick up the cost. We are a state institution.
The CIO will not be able to justify bandwidth in excess of what
the university community as a whole needs. They have matched so
far because it was matching, and there were several other researchers
who needed improved bandwidth.
b. I think that the characterization above will change over time.
As we move from stunt applications to an integrated environment
there will be much more "background" traffic as the underlying
metasystem moves binaries around, copies data files, provides
remote data access etc. I think the "low bandwidth utilization"
is a result of the software to make the metasystem easy to use
not being ready.
c. We are not moving that way. Nor can I forsee a time when that
could possibly happen. If it did, then poor departments (e.g.,
English, History) would get lower quality service. At UVA those
are the powerful departments.
3.a The environment for high performance scientific computing
will change significantly over the next five to ten years. Today
users are very aware of where physically their programs are executing,
where their data resides, and when a processor fails. In the not-to-distant
future that will not be the case. Instead users will manipulate
objects, applications, data files, instruments, ongoing simulations,
etc.. The underlying system will manage scheduling, accounting,
data and binary migration, security, and fault-tolerance in a
heterogeneous, physically distributed computational environment
composed of physical resources controlled by multiple administrative
units.
The state of the art
Today users typically must decide where to execute their applications.
They can choose to run it on their workstation (if it will fit),
on a local high performance machine, e.g., at a university or
lab-wide computing center, or they can execute at a supercomputer
center. The choice involves many trade-offs. First there is the
scheduling issue, which choice will result in the fastest turn
around? This is particularly acute when the user may have accounts
at multiple centers. How is she to know which is likely to result
in the best performance without manually checking each. Then there
are the inconveniences of using remote resources. The data may
first need to be physically copied to a remote center, and the
results similarly copied back for analysis and display. This data
copying may be further complicated by the need to synchronize
with collaborators to have them copy their data as well. Then
there are the administrative difficulties of acquiring and maintaining
multiple accounts at multiple sites.
The net result is that users today spend a great deal of time
manually structuring and managing their computations. In effect
today's computational scientists must become a high-performance
computer expert in order to achieve good performance on their
codes. Rather than trying to understand the science of their application
they are spending their time trying to outguess the computer.
The Future
The resources contained in future metasystems will include compute
engines (from high-end supercomputers to workstations and personal
computers), digital libraries (ranging from traditional text-based
document collections to very large scientific databases), physical
devices and specialized instruments (e.g., telescopes, microscopes,
machine tools, satellite imaging systems), and high-speed communication
links. The resources will be owned by a many different individuals
and organizations each with different objectives, and few of them
altruistic enough to just give away their resource.
Metasystems of the future will support the illusion of a single
virtual machine to users, a virtual machine that provides secure
shared object and shared name spaces, fault tolerance, improved
response time, and greater throughput.
The potential benefits of a successful metasystem are enormous
and include: (1) more effective collaboration by putting collaborators
in the same virtual workplace; (2) higher application performance
due to parallel execution and exploitation of off-site resources;
(3) improved access to data and computational resources; (4) improved
researcher and user productivity resulting from more effective
collaboration and better application performance; (5) increased
resource utilization; and (6) a considerably simpler programming
environment for the applications programmers.
b. Human resources.
4.a I think that this is a middleware problem. I think that Legion
and Globus will alliviate this problem by simplifying the use
of the distributed resources. What is needed from the centers
is support for the middleware software (people on the ground),
and training of the user community in the use of the middleware.
b. People on the ground at the centers working on the middleware
as partners with the development teams.
I don't know where the money should come from. Let me state though
that if the metasystems projects fail (in the sense that they
do not significantly lower the barriers to use of the distributed
resources) then the very high speed networks will only be used
by a few very dedicated scientists .... and in a sense the networks
will fail too.
1.a I don't believe that the networking requirements of advanced
computing will be met with off the shelf commercial services in
the near term, at least not in a cost effective manner. It will
require a collaboration, as in the past, between research and
education networking and the research computing areas to build
the critical mass to support a network infrastructure providing
both high performance production services and networking research.
Both of these areas will benefit from federal funding sources.
But the funding must be used additively... By this I mean that
funding must be coordinated to create a critical mass that can
drive prices down and performance up.
b. The two extremes of networking are both problems: ad-hoc point
to point circuits may provide performance and latency control,
but cost and reuse are both down sides of this model; latency
across a general purpose network may be high since hop count may
not be controllable...
c. NSF must lead with a vision that encompasses both high end
computing and high end network, both the research components and
the production components. There are models that support both
(see Bob Aiken's MORPHNET model from the DoE). If NSF can create
a vision that supports both of these while moving from a backbone
investment closer to the edges of the service (for example, funding
applications but providing guidelines and certifications for services...)
this vision will create the model that supports the critical mass...
d. This may be a hard question... There are so many conflicting
requirements in this field. Latency is hard... QoS is hard today,
unless one overbuilds and dedicates services. This is changing
but it will be a while before it's done... We have succeeded in
the past and present by overbuilding and heavily managing. the
vBNS is a model of this... The Abilene network is repeating this
model. There are other models that may work as well... More on
this later...
2.a This is very hard... I think the only way to garner continued
support is to create a business model that allows a university
to put more services into this high speed bucket and take money
from other pots to fund these initiatives, while not giving up
services... It's this leveraged service model that I support for
the future. The universities are seeing a huge increase in use
of ISP services and costs for these are tremendous... The quality
of these services is not good... Our core business is not being
served as well as it should be by these services for these dollars.
If we can use the new initiative and critical mass to create an
"Intranet for education and research" across the country, moving
all ISP type traffic onto virtual networks supported by the same
low level services as the research networks, we win... If we can,
in the future, funnel some part of our voice services, especially
that which is between our partnering universities over this same
infrastructure, if we can utilize the network that we're building
to bring competition in commodity network services to underserved
areas, we win... If we can eventually utilize the same infrastructure,
a very robust and redundant and high performance low level infrastructure
to create the virtual networks that we need for high performance
computing and network research along side of these commodity services,
building competition and pushing commoditization, we all win big.
b. See the above diatribe... I think this is the major reason
to combine these services over the low level infrastructure...
For example, ATM allows one to utilize VBR services to provide
guaranteed bandwidth when necessary and use UBR services on the
same pipe but a different virtual circuit to fill the pipe when
underutilized with commodity or at least other non-priority traffic...
This is the win... MPLS may in the future provide these same services
over an IP network...
c. I think we will eventually have to do this, but when you charge
early on for services like this, you dis-incent the users of the
services... What we need is to get the win by combining services
and do flat rate pricing in the early stages of the high performance
networking and computing, similar to what we did with internet...
Today we're paying by the packet, basically and we're all struggling
with how to cut back, not how to innovate...
3.a This could be great or it could be bad... If the applications
and middleware builders are strongly encouraged to combine their
resources, say regionally, and support networks that can do the
above leveraging, then we all win... If the apps and middleware
folks can put up point to point network links that serve only
their app and go away when the project is done, we all lose. We
don't get the benefit of driving competition and the commercial
markets to where we want to be and miss out... When the NSF net
was sold, education and research were 95% of the users... After
only a couple of years, education and research were less than
5% and now shrinking further... Who gets heard by the commercials?
the commercials, not educaton and research. Now we have the attention
of some of the larger commercial entities since they are seeing
the need for more creative energy from the ed/research side to
get them the next steps... We had better leverage that and build
something that we can keep for the future... This is the basic
idea behind Internet2, though I think the current implementation
of I2 needs some modification in order to succeed ultimately...
b. Of course, the QoS research that's going on will be necessary...
The continued convergence between ATM and IP that's happening
is a good thing... Security is a most necessary part of all this...
We've ignored it too long at the expence of our campuses today...
Security incidents are becoming the most time consuming part of
our jobs...
I also think that we need to pursue cross admin domain functionality.
The vBNS and Abilene don't really address this issue, especially
where ATM is concerned and signalling across these infrastructures...
AT&T has stated that they will not pass signalling to any other
network providers. This is a safe thing to do, but it's not the
way to get us into the future... Middleware for collaborations
is another key element that's missing today... Until we get this
we'll have to travel and that's expensive...
4.a As I said above, I think building a general infrastructure
that can be leveraged to quickly assemble experimental testbeds
and infrastructure to support research is necessary... Researchers
generally don't want to wait for 60-90 days and deal with phone
company procedures to get circuits up to their peer institutions...
What the PI's need is some general understanding of the overall
area... Most folks want to do the right thing if they understand
that it will not hinder what they are doing and will help others.
This is hard sometimes to get across... If NSF awards do not require
PIs to put their connectivity dollars into a generalized infrastructure
for the common good of research (and maybe save some dollars too
in the process), some will still do this, and some will not...
Education is probably the most effective way to get consistency
in this matter...
b. Priorities in my view are to build a vision that will take
us into the future, leveraging research, education, and commodity
services to create the critical mass to build the networks that
we need, sharing these to solve our service needs. They can be
financed by paying for networking via awards to applications and
middleware but this connectivity must be done in such a way that
this leveraging is done, not individual services that are short
lived... Hope this makes some sense in light of the previous discussion...
1.a PACI is a NSF-wide program and must continue to have strong
ANIR support. The perception has been that NSF/ANIR covers backbone
connections and this is sufficient for the Partnerships. Perceptions
are beginning to change however, as witnessed by the results of
the Site Review in October, the NPACI Program Plan's Appendix
B (Networking), and the joint Alliance/NPACI Proposal to NSF entitled,
"Building and Applying a National-Scale Grid: A PACI-wide Next
Generation Internet Testbed".
b. An important issue that needs to be addressed would be guaranteed
and reserved bandwidth, as well as, bandwidth usage coordinated
with resource usage. For example, if an astronomer had a limited
amount of time reserved on a supercomputer and a telescope in
order to do his/her observations, and the area he was to observe
would be dependent upon certain observable data, bandwidth availability
would be crucial. With guaranteed/reserved bandwidth, the astronomer
could analyze incoming data in order to accurately choose his/her
coordinates, and continue his/her experimentation in real-time.
A lapse in communication due to lack of bandwidth could cause
the researcher to waste time on irrelevant and perhaps useless
data in regards to their specific research.
c. PACI is a NSF-wide program and must continue to have strong
ANIR support. Ultimately, it is the NSF's responsibility to support
the research that the private sector will not. In regards to the
user community, i.e. industry and researchers, the NSF may not
play a major role, but the commercial world does not support network
researchers. In other words, network researchers worry about performance
while the private sector worries about price over performance.
Therefore, for network researchers, the NSF is the main source
to look towards for funding for this type of research.
d. N/A
2.a Campuses maintain strict control over their campus infrastructures
and the network connections to those campuses. PACI should serve
as a resource to assist these campuses to better meet the need
of PIs participation in PACI. Relationships need to be cultivated
with the CIOs on campuses to assist them in better understanding
the benefits of PACI partnership and of upgrading the campus infrastructure
to levels enabling PIs to utilize high performance networks. Beyond
PACI serving as a broker, this is an issue of commodity vs. high-end
experimental networking that will be funded by the private sector
or government agencies, respectively.
b. Use and publicize appropriate metrics, do not promulgate the
24 hour bandwidth usage.
c. Yes, institutions will move toward cost-recovery, and high-end
experimental networks will be left 'holding the bag', in other
words, looking for funding from other sources such as the NSF.
3.a Five years from now, people who use the high performance networks
will be divided into two sub-groups, network researchers and researchers
who use the high performance network. The first group's intellectual
curiosity will be to study the actual workings of the high performance
network, i.e. routers, switches, backbone, bandwidth, and speeds
of packet transfers. The second group will merely be using the
high performance network to satisfy their intellectual curiosity
that focuses on disciplines including but not limited to, computer
science, astronomy, biology, marine science and chemistry. The
experiments that both groups will be working on will change over
the next five years; some of which we cannot even begin to imagine,
today.
b. In five years, the network will possess many differences including
greater bandwidth and features that do not currently exist. These
features might include 'quality of service' technologies, reservations
for bandwidth, guaranteed bandwidth, and end-to-end security.
4.a Alliance and NPACI responsibilities would be to serve as a
broker for PIs, to set performance levels, to provide support
and education analogous to the Scientific Computing Group, and
to develop a form of certifying the performance of partners.
b. This is also an issue of commodity vs. high-end experimental
networks. Each will be financed from separate sources.
1. The primary barrier to better utilization of advanced distributed
resources is the lack of ubiquitous and interoperable middleware.
While there is much discussion about what is in middleware, a
rough hierarchy can be identified, with underware such as identity,
multicast and DNS, basic middleware such as authentication, directories
and authorization, and advanced middleware such as shared state
for collaboration, coscheduling of distributed resources, etc.
(The hierarchy is based upon dependencies.) Underware, with its
close association with local physical infrastructure, can be assumed
to be a campus-provided service. (Obviously there are significant
off-campus issues in each of these areas.)
It is essential that the basic middleware for distributed research
groups work within the context of the primary institutions that
researchers belong to. Such utilities as authentication and directories
for PACI researchers ought to be consistent, in approach and user
interface if not in implementation, with their campus utilities.
There needs to be an alignment between the critical if pedestrian
needs of campuses for such middleware and the activities of PACI.
Advanced middleware is not likely to be developed on campuses,
given both the press of business and its lack of immediate relevance
to broad campus needs. Thus advanced middleware is appropriate
grist for PACI and other advanced computing and collaborative
organizations. In the development of this upper middleware, care
should be taken to build upon the campus-based basic middleware.
NSF could have two vital roles in these developments: to nurture
the development of basic middleware on campuses and to directly
support the creation of the advanced middleware needed for distributed
research activities. Nurturing the development of basic middleware
on campuses must recognize that the challenges are not only technical,
but also in design and policy. Thus a three-pronged approach,
proceeding in parallel, is needed to create or shape (in the case
of commercial product development that may not meet the unique
needs of academia) basic technologies, to identify effective campus
design strategies (such as directory schema, security zones, etc.)
and to push the policy stakeholders on campus (such as registrars
and legal counsels) to resolve the complexities that FERPA, Open
Records, etc. introduces.
SLA's will have important but limited roles. It will be important
for campuses to validate performance characteristics from the
campus egress to remote resources. On campus, however, a variety
of ad hoc approaches to network performance (using primarily overprovisioning)
will likely mean that SLA's will not be used locally.
2. It is worth some deeper analysis to see how the network traffic
patterns of researchers fit within broader campus patterns. On
larger campuses, assuming 6 mbps per average desktop times 20,000
ports is 120 Gbps. In this perspective, researcher use spikes
may be less significant than in today's traffic. The moral may
be to raise the rest of campus usage so that research use is less
conspicuous.
But if it is the case that research network traffic does indeed
require consequential funds, then the next step is to examine
the location and type of the costs. If the cost is at the egress
(say in purchase of premium traffic service) then a set of yet-unknown
issues (eg does the premium service also improve the regular service,
is it priced per use, etc.) will determine institutional policies.
There is not likely to be sophisticated QoS charging schemes until
management tools improved greatly. The costs that are in the campus
infrastructure (for example running gigabit ether to a desktop
from a special backbone), are likely one-time and can be promoted
through traditional academic processes.
3. It is worth considering what the ubiquitous campus computing
environment will be like in five years and what infrastructure
is needed for that. Identity and authentication services will
need to be deployed to support out-sourced modem pools, library
database accesses, and on-line student transactions. Authorization
will be needed to drive workflow systems and manage transaction
systems. QoS software will be used to meter usage of external
resources for distance education. Directories will enable students
to sit down at any keyboard in a public computing lab and have
their bookmarks and aliases readily available. Distributed file
systems will put their "locker" at their desktop. Administrative
applications will be driven by centralized middleware and, in
turn, feed those repositories with automated feeds and updates.
It is important to note that the services above must be multiplatform,
not only in the clients that they serve, but in the core servers
that operate the services. For example, identity for users must
span the traditional Unix orientation to serve for modem pools
and for NT servers. The services must have applications that use
them in almost all areas; for example the lack of a secure ftp
could compromise the security of all other apps, since users tend
to use the same password in secure and insecure environments.
The services must also have interoperability with each other and
with different campuses. It would be useful to have public domain
implementations of key services and applications.
4. The Alliance and NPACI are in a difficult position, hampered
in an area that they have little opportunity to remedy. Moreover
it is frustrating that the technical challenges are often overwhelmed
by issues of policy and scaling.
The Alliance and NPACI should focus on creating the specialized
superstructures necessary to support advanced applications and
seek to influence campus development of the basic middleware services.
It should facilitate in the designation of top-level services
providers (for certificates, for directories, for name service,
etc.) in the key areas where Alliance and NPACI customers are
located (.edu, .gov, etc.) It should also work for technology
transfer mechanisms in areas specific to advanced networking and
computing, such as tuning high-performance workstations, effective
use of QoS, etc.
1. Commodity access will always lag the demand of high end users
whose very demands shape the emerging generation of routine applications.
A mix of unique and commodity resources will be required. Incrementally
lower costed services won't drop prices fast enough to satisfy
the needs. ANIR must focus on joint ventures between industry,
university and government research labs. It is the only way to
leverage the best of each domain and get the cost effective deployment
of broadband network resources. It is further the fastest way
to migrate good ideas into volume, commodity deployment. Unfortunately,
in general, legislative relief from the Federal Acquisition Regulations
may be required to foster the collaboration.
2. When NSF/ANIR phases out it direct support of high performance
connections, the funding increases to individual PI's must compensate,
and, a competitive / collaborative infrastructure must exist to
guarantee that the resources are available to compete for the
moved "network dollars."
Again, I feel that the availability of anything beyond incremental
price / performance improvements will be dependent upon a government/
industry partnership in which government shred the risk for initial,
far reaching technology platforms upon which the ideas needed
in five to ten years are tested in the bright light of a broadly
available, large fast infrastructure that complements the incremental
price/performance advances of commodity services.
3. Visionaries succeed in articulating a vision. They promulgate
the vision widely and engage the R&D community in implementing
its components. Success is achieved when the implementations are
"ubiquitous" -- high volume production and use with easily mastered
human interfaces. A capability is ubiquitous when you notice it's
missing rather than it's being present.
Political decisions will effect resource siting. Independently
of that, the resources that will be clustered at advanced computing
sites will reflect the current technology about to be deployed
in 'volume' production. In five years the computer science research
community will be using the tools produced in volume as a result
of today's visions, implementing the current vision and formulating
the next. The implementations will display the continued cycles
of concentration and distribution during the evolution.
Central to this flow of computing and storage between central
sites and remote sites are the networks -- the bigger and faster
they are, the more rapid the cyclic evolution in which new capabilities
are alternately centralized or distributed depending upon the
current technology. Central to deployment of the networks is the
application driven business case required to justify funding the
infrastructure. It will necessarily lag the computer science R&D
but both communities will benefit from promulgating needs and
capabilities into each others' domains.
In my vision, higher and higher speed networks eliminate the need
for workers to be tethered to any given site to access either
colleagues or resources. Further, the business community has recognized
the value of such collaborations and willingly participates and
contributes resources since they believe it is in their best interests
to do so.
4. The only resource that consistently requires centralization
is expertise. Further, there isn't enough to go around. I believe
that every effort should be made to centralize training, education
and support -- using network technology to distribute that 'central'
function so that the "sun never sets" on the "help desk." The
tools are just now emerging (but not yet "ubiquitous") to implement
the "distributed center." I'd invest in fixing the human interface
to that capability to eliminate as many impediments between a
scientist or student and his or her resources as possible.
1.a-c NSF's financial support of the actual transit pipes should continue for key resource centers and sites where cutting edge research requires extensive bandwidth. But equally important, NSF should now take steps to extend its assistance (and equivalent expectations) to end-to-end support of the advanced networking infrastructure. These end-to-end requirements range from:
Addressing these points will help to ensure that community resources (particularly NSF-sponsored PACI resources) are accessible and usable by distributed researchers. It will also require that NSF/ANIR expand its traditional definition of infrastructure support (currently synonamous with transit/backbone support) to include enabling technologies and services and end-to -end networking.
d. In terms of Service Level Agreements, large commercial customers are beginning to insist on SLAs with their providers and are establishing means of monitoring or verifying that they are receiving the promised levels of service. Academia will probably follow suite soon. The question is where a university's SLA (for its commercial Internet service and/or R&E; network service) will be determined solely by the office of the CIO. If PACI researchers and resource sites have measureable service level requirements, then they need to articulate these requirements and ensure that these requirements are addressed during provisioning of the campus infrastructure and negotiations network providers.
2. While NSF support for the advanced networking requirements of some researchers and support for the community's access of critical resources should continue, it is important that Universities start the process of redefining how they approach their networking costs. Some suggestions include:
3. The pace of technological advancement in the Internet is continuing to accelerate, making networking research increasingly critical and increasingly difficult. Any aids with respect to middleware and related tools are very important. Middleware-ish technologies like caching and information retrieval aids are also vital to users being able to easily locate and retrieve information. Another area of great importance to our group (CAIDA), and to other distributed/collaborative organizations, would be more effective collaboratory tools (interactive audio/visual/data capabilities) and more intuitive interfaces to important technologies such as multicast.
4. PACI (NSF, NPACI and Alliance) have a responsibility for making the partnerships work. Networking should be viewed as the vital link connecting PACI's disparate parts. Direct support of all partnership participants from desktop tools, to applications-design tools and support for researchers, to LAN and WAN support for researchers (e.g., end-to-end high performance h/w, s/w, and connection -- as well as technical support to ensure it works), to education and feedback for an emerging community of users of advanced networks and of developers of applications to run on those networks.
1. I am unhappy with the primary assumption here. Would not the
greatest contribution ANIR could make be to support the establishment
and operation of one or more high-speed networks designed to provide
just these services (metacomputing, for instance)? What other
infrastructure could ANIR provide that would help in this regard?
The most critical end-to-end issues (accepting then the assumption
that high-speed long-haul traffic will be carried commercially)
will undobtedly be the usual last-mile problems of connections
to and within campuses. This where where ANIR needs to work with
local campus networking in order to ensure that the access needed
is provided. As far as NSF's responsibilities are concerned, I
think there is a perception problem here that, for instance, the
PACI program is an ACIR program: fundamentally, it is an NSF program
that affects essentially every office within the Foundation. Once
this is recognized, it becomes clear that a national networking
program is an NSF program, not merely an ANIR program. In this
sense NSF should have the responsibility of ensuring that the
networking infrastructure can support the other programs (e.g.,
metacomputing within PACI) as required. I am not qualified to
comment on SLAs.
2. As noted above I have very mixed feelings about the basic premise here. Accepting it for the purpose of discussion, the answer to the first subquestion is straightforward: the usual methods. That is, the more politically skilled or glib groups will be more successful than the less glib. I think it is unlikely that discipline-specific funding could be used for this sort of thing: that is, I doubt that one could persuade, say, NSF's Chemistry Division to pay this for computational chemistry activities.
The issue of how to justify the use of such networks in a "bursty" mode is to point to the quality of the science etc being performed. This is not dissimilar to having to justify idle time on a large computer generated by reserving most or all of the machne for a specific large calculation. In that case the research has undergone a peer-review process that indicates its worth, and this process can factor in the consequences such as idle time or delaying other users. I can only speculate as to what institutions will do, but it seems likely that campus networking directors will turn to whatever is needed to balance their budgets. Cost-recovery is a pretty easy approach from their point of view --- it is rarely successful in getting the best science done and is sometimes strongly counterproductive, however.
3. Our (quantum chemistry's) main benefit would be the ability to run metacomputing calculations that could take advantage of computational resources at many sites. This would increase the total computing power available to us to perhaps 100 TFLOPS, assuming that ASCI-class sites were accessible as part of this. For methods with good scaling, this could mean an order-of-magnitude increase in the size of systems that could be studied. The benefits would be priceless. The main requirements are metacomputing software and the networks!
4. This is completely bound up with my response to Q1. If NSF
shoulders its responsibilities and supports the basic infrastructure,
the PACI partnerships can do much of the rest, especially the
education and training component. If NSF will not shoulder its
responsibilities, the partnerships obviously can do little more
than help the users cope with an inadequate infrastructure.
1.a I'm not sure how to answer this. Clearly the centers need to keep in close touch with the research groups to identify problems, which may not be the problems that are expected.
1.b The chief concern of most computational scientists is high bandwidth data transfers and interactive access to remote sites without undue delays. Reliability, in the sense of always being able to make such transfers, is important. Simplicity of use and uniformity of interfaces across all the machines in use are important. Security is an important issue -- most of us have lost research time while various machines were fixed after being hacked.
1.c I am personally skeptical about commercial carriers' ability to provide this level of service, or their interest in doing so. I suspect that NSF could make the scientific community more productive by continuing to play an active role in "infrastructure" type networks, in addition to "research about networks".
2.a The best way to handle this is to emphasize the wide range of usage levels, extending from the few groups that regularly require real time video on down to the many people requiring smaller resources, such as netscape connections to the preprint servers. This has implications for the design of the network --- a network serving only a small number of users is unlikely to get support.
2.b I can't help you on this one!
2.c I suspect there will be such moves. The most recent proposal at my institution was for a monthly charge per IP address.
3.a Most of the computation intensive applications in my discipline (theoretical physics) are sensitive to latency times, and so are unlikely to be done on widely distributed systems. What should be in place in five years is an environment where codes and data can be conveniently moved to the available computing engines. Of course, this is possible now; the key is to develop the software and hardware infrastructure to make it easy to do.
4.a I would hope that Alliance and NPACI would play important
roles in all aspects of developing and operating networks for
computational scientists. One aspect which is not touched on elsewhere
in these questions is that Alliance and NPACI are particularly
well placed to evaluate new products, make them easily usable,
provide documentation, distribute them to researchers, and encourage
their adoption. The classic historical example of this is Mosaic.
More recently, I downloaded and installed the secure shell package
from SDSC.
| for more information: info @ caida.org | last update: Mar 3 20:30:47 1999 |
| this page: http://www.caida.org/PACI/index.html | |