Tucson Test Stand¶
This section contains site specific variations for the Tucson test stand.
Resources¶
- LOVE: http://love1.tu.lsst.org
- LOVE (k8s): http://love.tu.lsst.org
- Argo CD: https://tucson-teststand.lsst.codes/argo-cd
- Chronograf: https://tucson-teststand.lsst.codes/chronograf
- Nublado: https://tucson-teststand.lsst.codes/
- Rancher: https://rancher.tu.lsst.org (1)
- Slack: #rubinobs-tucson-teststand
(1) Need to get kubeconfig file from here. File a Jira ticket with Tucson IT for access. Once able to log into Rancher:
- Select the pillan cluster.
- Click the Kubeconfig File button in top-right.
- Near bottom of dialog, click the download link.
Non-Production Systems¶
The Tucson test stand operates all CSCs and systems on the production domain.
Bare Metal Machines¶
- LOVE: love1.tu.lsst.org
- T&S CSCs: tel-hw1.tu.lsst.org
- Kubernetes: Can be done from own machine, just need kubeconfig file and kubectl installed.
- Systems run on the pillan cluster.
- Can also use: https://k8slens.dev/.
- ATCamera (Tony Johnson): auxtel-mcm.tu.lsst.org
- CCCamera (Tony Johnson): comcam-mcm.tu.lsst.org
- ATArchiver (Steve Pietrowicz): auxtel-archiver.tu.lsst.org
- CCArchiver (Steve Pietrowicz): comcam-archiver.tu.lsst.org
- Calibration Systems (Patrick Ingraham): loonie.tu.lsst.org
LOVE Summary View¶
The overall system summary state view is called Summary State
.
Checking the Number of Federations¶
This uses a script in https://github.com/lsst-ts/k8s-admin. Run ./feds-check from a machine with kubectl and the proper kubeconfig file.
Shutdown DM and Camera Services¶
- Shutdown/Cleanup daemon on Archiver machines:
- docker stop ospl-daemon
- docker rm ospl-daemon
- Shutdown Camera OCS Bridges:
- ATCamera: sudo systemctl stop ats-ocs-bridge.service
- CCCamera: sudo systemctl stop comcam-ocs-bridge.service
- Shutdown Camera Daemons
- sudo systemctl stop opensplice.service
- Command is the same everywhere.
Shutdown LOVE¶
This needs to be done from love1.
- Uses the
docker-compose-admin
scripts intucson-teststand/love1
directory. - ./shutdown_love
- ./shutdown_daemon
- Uses the
Shutdown T&S Bare Metal Services¶
Handle tel-hw1:
- Uses the
docker-compose-admin
scripts intucson-teststand/tel-hw1
directory. - ./shutdown_atmcs_atp
- ./shutdown_m1m3
- ./shutdown_daemon
- Uses the
Handle calibration systems:
Log into the machines listed in that section then stop and remove all running containers.
Interacting with Kubernetes¶
Commands can be executed from your own machine with kubectl and the proper kubeconfig file.
Shutdown Main Daemon¶
The main daemon on TTS runs on Kubernetes.
Shut it down by deleting the deployment under the ospl-main-daemon
app on Argo CD.
Update Configuration¶
- Gather the branch for the configurations and version number for
ts_ddsconfig
. - Uses the
docker-compose-admin
scripts intucson-teststand
directory. - Directories to update:
/deploy-lsstts/docker-compose-ops
(love1, tel-hw1)/deploy-lsstts/ts_ddsconfig
(love1, tel-hw1)/deploy-lsstts/LOVE-integration-tools
(love1)- sudo ./update_repo <repo path> <branch or version>
- This will fail if the branch has local modifications. At that point you may as well just do the job manually. Here is one way to do that:
- cd /deploy-lsstts/<problem directory>
- git status
- sudo git reset –hard origin/<current ticket branch>
- Return to the
docker-compose-admin
scripts and run the update_repo command again.
Startup Main Daemon¶
The main daemon on TTS runs on Kubernetes and will be handled by the sync_apps.py script. This will be detailed in the next section
Startup Minimal Kubernetes System¶
This replaces most of step 6.3 in the main document. Follow the first three bullet points in that step and then continue the process with the next steps.
- python sync_apps.py -p
- csc-cluster-config, ospl-config and ospl-main-daemon apps will be synced automatically.
- Once the ospl-main-daemon app is synced, the script will pause.
- Check the logs on Argo CD UI to see if daemon is ready.
- Type
go
and enter to move onto syncing the ospl-daemon app - Once the ospl-daemon app is synced, the script will pause.
- Check the logs on Argo CD UI to see if daemons are ready.
- Type
go
and enter to move onto syncing the kafka-producers app. - Script will again pause once the kafka-producers are synced.
- The kafka-producers use a startup probe, so once all of the pods show a green heart, type
go
and enter to move onto syncing the love app. - Once the love app is synced, stop here and return to step 6.4 in the main document.
- Make sure you leave the script running.
Startup LOVE¶
This needs to be done from love1.
- Uses the
docker-compose-admin
scripts intucson-teststand/love1
directory. - ./launch_daemon
- Ensure daemon is ready before proceeding.
- ./launch_love
- Uses the
Startup T&S Bare Metal Services¶
Handle tel-hw1
- Uses the
docker-compose-admin
scripts intucson-teststand/tel-hw1
directory. - ./launch_daemon
- Ensure daemon is ready before proceeding.
- ./launch_atmcs_atp
- ./launch_m1m3
- Uses the
Enabled CSCs¶
If proceeding with integration testing, the CSCs will be brought to ENABLED state as part of that process. All of the startup processes maybe necessary for recovering the TTS from any maintenance. In this case, all of the CSCs must be returned to ENABLED state. The following components will automatically transition to ENABLED state when launched:
- Watcher
- ScriptQueue:1
- ScriptQueue:2
- DSM:1
- DSM:2
For the other components, leverage the following scripts. Required configurations will be given for each script execution.
Note
Both ATCamera and CCCamera must be in OFFLINE_AVAILABLE state before putting them into ENABLED state.
auxtel/enable_atcs.py
athexapod: ncsa atdome: current ataos: current
auxtel/enable_latiss.py
atcamera: Normal atspectrograph: current
maintel/enable_mtcs.py
mtm1m3: Default mthexapod_1: default mthexapod_2: default
maintel/enable_comcam.py
cccamera: Normal
set_summary_state.py
data: - - DIMM:1 - ENABLED - current - - DIMM:2 - ENABLED - current - - WeatherStation:1 - ENABLED - default
set_summary_state.py
data: - - Scheduler:1 - ENABLED - standstill - - Scheduler:2 - ENABLED - standstill - - OCPS:1 - ENABLED - LATISS - - OCPS:2 - ENABLED - LSSTComCam
Note
The Schedulers MUST be ENABLED AFTER ATPtg and MTPtg have been ENABLED. Otherwise they will go into FAULT state. That is why this script execution is run last.