POSTED: August 16, 2017

High Performance Computing Systems Administrator (Operating Systems Programmer/Analyst 2 or 3)

University of Connecticut
Storrs, CT
Full time
Technical (Programmer, Developer, Analyst)

Under the general supervision of the Team Lead for Research Technology, these positions are responsible for the day to day operations of the university’s large-scale research computing platform on the Storrs campus.

Incumbents in this position are considered either technical specialists or technical experts based on their level and are expected to have expertise with large-scale integrated computing systems. Individuals are expected to apply a wide range of problem solving and resource management techniques to their work which is of moderate to complex difficulty. Incumbents are expected to carry out projects of moderate to large size and complexity with minimal supervision.

The HPC cluster at UITS serves as the catalyst for research in computationally intensive disciplines, in alignment with the university’s ambitious initiatives – UConn Technology Park, Next Generation Connecticut, and Bioscience Connecticut. The cluster has over 6,000 CPU cores, a one petabyte parallel file system (GPFS), and an Infiniband interconnect.

Duties and Responsibilities

• Install, configure, and maintain Linux operating systems (RHEL v6/v7).
• Perform operating system upgrades, patching, troubleshooting, performance tuning.
• Manage the lifecycle of Linux-based scientific applications, including compiling from source and troubleshooting.
• Manage the HPC scheduling software layer (SLURM) and related tools.
• Installation and management of configuration, monitoring, and notification tools.
• Administration of an Infiniband fabric and basic Ethernet network administration.
• Hardware maintenance and troubleshooting
• Interact with university researchers on various topics, including the use of existing services, service policies, and research requirements.
• Provide excellent technical support and training to a diverse user base.
• Create and maintain clear and effective technical documentation.
• Interact with vendors, assessing products and making purchasing recommendations.
• May supervise students or employees.
• Performs related duties as required.

Appointment Terms

This is a full-time, permanent position. The University offers a competitive salary, outstanding benefits, including employee and dependent tuition waivers at UConn, and a highly desirable work environment. Salary and level will be commensurate with the successful candidate's background and experience.

To Apply

Please apply online at, Staff Positions. Interested candidates should submit a letter of application and resume that demonstrate how you meet the minimum qualifications and any preferred qualifications you may have for this position, and a list of contact information, including phone numbers for three professional references. Reference search # 2018042. Screening will begin immediately.

This job posting is scheduled to be removed at 11:59 p.m. Eastern time on September 24, 2017.

1. 2 or more years of experience in the hands-on management and troubleshooting of Linux systems (RHEL, CentOS, etc.). 2. Demonstrated experience in software installation, compilation (GCC, Intel ICS, etc.), and troubleshooting in a Linux environment. 3. Demonstrated experience using scripting/programming (Bash, Python, etc.) in support of systems operations. 4. Demonstrated commitment to providing excellent technical support to a diverse user base. 5. Good organizational skills and attention to detail along with good written and oral communications. 6. The ability to work with minimal supervision. 7. The ability to work effectively with staff and users at all levels, vendors and other technical staff and as a member of a team. 8. Demonstrated ability to meet deadlines and work under pressure. 9. Demonstrated ability to lead projects. Preferred Qualifications 1. Experience installing and troubleshooting enterprise hardware platforms, such as servers and storage. 2. Experience with HPC infrastructure components such as job schedulers (SLURM, LSF, etc.), environment management (Modules, Lmod, etc.). 3. Experience managing large-scale data storage systems, preferably parallel file systems such as GPFS, Lustre, etc. 4. Experience in systems automation (DevOps) using tools such as Ansible, Puppet, Chef, etc. 5. Maintenance and troubleshooting of an Infiniband fabric. 6. Certifications relevant to this position.
1. Bachelor’s degree in Computer Science, Computer Engineering, or closely related field; or equivalent combination of training and experience and 2 or more years’ experience in a large scale computing environment.
The University of Connecticut has been named the top public university in New England for over a decade and is ranked among the top public universities in the nation. The University of Connecticut is also a Carnegie Foundation Research University, a prestigious honor shared by only the nation's top higher education institutions.