Note: This hoteling policy is for Georgia Tech Faculty and staff who want to colocate their equipment with CRNCH RG resources.
What is Rogues Gallery Hoteling?
Hoteling is a term used with compute infrastructure to indicate that a researcher has bought into a shared resource using their own funds. Hoteling is usually implemented for two reasons: 1) to help provide basic infrastructure required for all researchers (networking, storage, scheduling, etc.) and 2) to help manage scarce cycles especially on novel pieces of compute hardware. In the case of Rogues Gallery, we are focused on supporting novel architectures, which are in many cases both expensive and hard to access for most researchers.
In this vein, Rogues Gallery hoteling provides the following benefits:
- A common, backed up /nethome for users with a standard 50GB quota for users
- Shared project storage on a research group basis
- Access to a non-backed up /netscratch that is 40TB
- Support for 10/40GE networked access to cluster resources
- Slurm scheduling support with dedicated queue support to maximize your cycles for any contributed resources
- Access for systems-level modifications on your own servers (ie, sudo, IPMI) for advisors and senior graduate students
- Appropriate documentation noting your ownership of the resources as well as a landing page for new students using the resources.
How do I get involved?
To join the RG hoteling agreement, we ask that we get written confirmation via email from you to incorporate your servers/equipment as part of the combined cluster infrastructure. This will give you and your users access to the benefits listed above. We also ask for written confirmation to leave the CRNCH hoteling arrangement should you decide to separate out and set up your infrastructure as an independent resource.
Servers will need to run the following services to be part of RG hoteling:
- Autofs – to serve shared storage
- SSSD – for GT authentication and scheduling
- Slurmd – Slurm daemon
- Fail2ban or alternative denyhosts solution – to prevent unauthorized hacking attempts
- Standard TSO Salt infrastructure for monitoring and any GT security packages
You also must agree to share relevant IPMI logins for servers with the relevant admins to allow for proper maintenance of included machines.
Equipment that is not a server must be attached to some kind of remotely accessible host to be part of the RG hoteling scheme. For a novel board like an FPGA or test chip, this can usually be set up using a desktop or Raspberry Pi host with USB-UART connection to the device under test. In this case, we ask the services above be loaded on the “host” device to allow for storage and scheduling support. Exceptions can be made in the case of non-schedulable resources or devices with limited host compute power.
To join CRNCH RG via hoteling agreement, please send us a note via our help ticketing system so we can discuss further.
What are some alternatives to CRNCH RG hosting?
The default model for faculty in GT’s College of Computing is that each lab installs and maintains their own equipment or asks the Technology Service Organization (TSO) for some minimum level of service. CRNCH RG advocates for a shared usage model because independent lab setups require one of your grad students to be the primary IT support, and it duplicates expensive resources like storage servers, switches, and tool licenses.
The Partnership for Advanced Computing Environments (PACE) used to have a more formal hoteling agreement, and CRNCH’s hoteling agreement is based in part on the success of this model. PACE currently has switched to an internal cloud model that gives researchers credits for use on PACE machines in a larger CPU and GPU cluster environment, Phoenix, as well as hot and cold storage.