HPC Center Operational Overview

High Performance/Research Computing Centers

An Operational Perspective

1) Examples

  • Nat Labs
    • Sandia
    • Argonne
    • Oak Ridge
    • Lawrence Berkeley
    • Los Alamos
  • SC Centers
    • Pittsburgh Supercomputing Center
    • San Diego Supercomputing Center
    • TAC Supercomputing Center
  • DoD MSRCs
  • Mid University
    • ND CRC
  • Corporate
    • Aerospace Industry
    • Materials Science
    • Wall Street
    • Pharmaceutical
    • Petrochemical
  • Broader Scope: Data Center, Colo, Extreme Scale Computing (Microsoft, NSA, Google)

2) Organizational Structure

  • Resource Management
  • User Support
    • 720 Users
    • 200 avg monthly utilization
  • R&D (Operational)

3) Resources

  • Hardware
    • CPU, GPU, FPGA
    • RAM
      • DDR2 mostly
    • Disk
  • Software
    • provisioning/cluster management
      • xcat
    • operating system
      • Linux
      • Top 500: [1]
    • middleware and libraries
    • applications
    • acquisition
    • maintenance
  • Network
    • Gigabit Ethernet
  • Storage
    • AFS
    • Panassus?
  • Rendering and Visualization

4) Facilities

  • Uptime (Resiliency, Reliability, etc...)
  • Security
  • Access/Function
  • Network Bandwidth
    • Bandwidth to Cook the Pizza
  • Scalability

5) Costs

  • Personnel
  • Equipment (CapX)
  • Operating Costs (OpX)
    • Often larger than capital equipment costs over a 3 year life cycle

6) Userbase

  • Diverse by nature of research
  • Tweak, tune, break, fix, repeat.... (you never hit steady state)