POSTED: August 16, 2017

High Performance Computing Storage Administrator (Operating Systems Programmer/Analyst 2)

University of Connecticut
Storrs, CT
Full time
Technical (Programmer, Developer, Analyst)

Under the general supervision of the Team Lead for Research Technology, this position is responsible for the day-to-day operations of the university’s large-scale research storage platform on the Storrs campus.

Incumbents in this position are considered technical specialists and are expected to have expertise with large-scale integrated computing systems. Individuals are expected to apply a wide range of problem solving and resource management techniques to their work, which is of moderate to complex difficulty. Incumbents are expected to carry out projects of moderate to large size and complexity with minimal supervision.

The HPC cluster at UITS serves as the catalyst for research in computationally intensive disciplines, in alignment with the university’s ambitious initiatives – UConn Technology Park, Next Generation Connecticut, and Bioscience Connecticut. The cluster has over 6,000 CPU cores, a one-petabyte parallel file system (GPFS), and an Infiniband interconnect.

Duties and Responsibilities

• Install, configure, and maintain storage systems for use in a research-computing environment.
• Install, configure, and maintain parallel file systems (DDN/GPFS), as well as interaction with tiered-performance NFS.
• Monitor usage of storage resources and make recommendations to maintain service standards.
• Interact with integrated systems in an HPC environment – including job schedulers, Infiniband fabrics, etc.
• Interact with configuration, monitoring, and notification tools.
• Hardware maintenance and troubleshooting
• Interact with university researchers on various topics, including the use of existing services, service policies, and research requirements.
• Provide excellent technical support and training to a diverse user base.
• Create and maintain clear and effective technical documentation.
• Interact with vendors, assessing products and making purchasing recommendations.
• May supervise students or employees.
• Performs related duties as required.

1. Two or more years of recent experience managing large scale file systems, preferably parallel systems such as GPFS, Lustre, etc. 2. Experience installing and troubleshooting enterprise data storage hardware platforms. 3. Demonstrated experience using scripting/programming (Bash, Python, etc.) in support of systems operations. 4. Demonstrated commitment to providing excellent technical support to a diverse user base. 5. Good organizational skills and attention to detail along with good written and oral communications. 6. The ability to work with minimal supervision. 7. The ability to work effectively with staff and users at all levels, vendors and other technical staff as a member of a team. 8. Demonstrated ability to meet deadlines and work under pressure. 9. Demonstrated ability to lead projects. Preferred Qualifications 1. Experience working in an HPC operations team. 2. Experience in the hands-on management and troubleshooting of Linux operating systems (Red Hat, etc.) and applications. 3. Experience in systems automation (DevOps) using tools such as Ansible, Puppet, Chef, etc. 4. Maintenance and troubleshooting of an Infiniband fabric. 5. Certifications relevant to this position.
1. Bachelor’s degree in Computer Science, Computer Engineering, or closely related field; or equivalent combination of training and experience and 2 or more years’ experience in storage administration in a large scale computing environment.
The University of Connecticut has been named the top public university in New England for over a decade and is ranked among the top public universities in the nation. The University of Connecticut is also a Carnegie Foundation Research University, a prestigious honor shared by only the nation's top higher education institutions.