Gateway server: Difference between revisions
(22 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
This page provides details on the tools and interfaces provided to monitor and diagnose issues with the Gateway Server | This page provides details on the tools and interfaces provided to monitor and diagnose issues with the Gateway Server. | ||
<b> Note: </b> In PuTTY the host name is gateway.tms-uk-rail.co.uk | |||
= Glances / top = | = Glances / top = | ||
Line 18: | Line 20: | ||
* Method 1: Access via the [http://77.68.28.115:61208/ Web Interface], or | * Method 1: Access via the [http://77.68.28.115:61208/ Web Interface], or | ||
* Method 2: Start an ssh session and at the prompt: <code><user>@prod_tms_server:~$ glances</code>. | * Method 2: Start an ssh session and at the prompt: <code><user>@prod_tms_server:~$ glances</code>. | ||
To quit Glances, click <b> Ctrl + C </b>. | |||
=== top === | === top === | ||
Line 25: | Line 29: | ||
To access top, start an ssh session and at the prompt: <code><user>@prod_tms_server:~$ top</code>. | To access top, start an ssh session and at the prompt: <code><user>@prod_tms_server:~$ top</code>. | ||
= | === What to look for === | ||
<u> Top Left:</u> The <b>CPU</b> and <b>MEM</b> percentages should not be too high, e.g. over 90%. | |||
<u> Top Right: </u> The <b>MEM, SWAP, LOAD</b> figures should be low numbers. | |||
<u> Bottom Left: </u> If there was an issue it gets logged to this <b> FILE SYS </b> section so the numbers here would be high and not in the green. | |||
If the CPU values are high, consider draining the forecast. For more information see [[Forecasting]]. | |||
= Application Logs = | |||
The Gateway Server provides [http://77.68.28.115/logs/ comprehensive logs] for all running services. The logging level is set to provide debug and error information and administrators therefore can expect that in some cases, the logs are quite large despite log rotation in operation. | The Gateway Server provides [http://77.68.28.115/logs/ comprehensive logs] for all running services. The logging level is set to provide debug and error information and administrators therefore can expect that in some cases, the logs are quite large despite log rotation in operation. | ||
<b>Tip:</b> Sort by <b> last modified </b> to see recent entries. The log(s) will display an <b>ERROR</b> if there is an issue. | |||
<b> Remember: </b> DARWIN goes down at 2am every day for approximately 10 minutes. </b> | |||
= RabbitMQ Interface = | = RabbitMQ Interface = | ||
Line 41: | Line 59: | ||
* Active message queues. | * Active message queues. | ||
<strong>Note: The value of using the [http://77.68.28.115:15672/ RabbitMQ interface] to diagnose issues cannot be underestimated | <strong>Note: The value of using the [http://77.68.28.115:15672/ RabbitMQ interface] to diagnose issues cannot be underestimated:</strong> | ||
::Where for example, a user has reported that the forecast service is not updating or the TD interface is not updating, RabbitMQ should always be the first port of call; It will be able to demonstrate if the problem is with the external services or within acumen by simply checking if the messages are being received and being consumed by acumen application servers. | |||
===What to look for=== | |||
Check the <b> Exchanges </b> tab and click on the relevant feed to see graphs of the in/out message rate. This indicates whether a feed has been lost. | |||
<b> Note: </b> The time frame for these graphs can be changed by clicking the link above the graph. | |||
Where an Administrator has identified that there is an issue with the TD.net inbound feeds, then an attempt should be made to diagnose the issue and re-start the MQIPT service as per [[Gateway server#MQIPT]]. | |||
If a feed has been lost, contact the responsible person as per [[Administration#Reporting Issues]]. | |||
= Acumen Service Management Script = | = Acumen Service Management Script = | ||
Line 51: | Line 81: | ||
<code><user>@prod_tms_server:/var/www/server_tasks$ ./service_script.sh</code> | <code><user>@prod_tms_server:/var/www/server_tasks$ ./service_script.sh</code> | ||
Administrators should observe the following: | Administrators should observe the following (click <b> enter </b> to reveal each line in turn): | ||
::[[File: Gateway_service_script.png|border|400px]] | |||
Administrators should note that all referenced services are configured as system services and are setup to restart on failure therefore it would be unusual to observe any particular service down however, an administrator should attempt to start a service and consult the relevant logs if a service is observed to be down. | |||
Also, selecting <b>X</b> against each service entry shows the systemd logs for the relevant service; this would be useful to diagnose any issues that the operating system has had in starting and running the service. Type <b>q</b> to release. | |||
= MQIPT = | |||
This is the component of the gateway Server that is responsible for maintaining the connection to TD.net, you can learn more about what MQIPT is [https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.0.0/com.ibm.mq.ipt.doc/ipt2090_.htm by clicking here]. | |||
Where an Administrator has identified that there is an issue with the TD.net inbound feeds, then an attempt should be made to diagnose the issue and re-start the MQIPT service. | |||
=== Diagnosis - Application Logs === | |||
Click on [http://77.68.28.115/logs/mqipt.log http://77.68.28.115/logs/mqipt.log] to open the MQIPT application logs in your browser. | |||
Administrators should look for an error or exception messages; where these are the final log entries, this would indicate an issue with the MQIPT service. | |||
=== Diagnosis - systemd status === | |||
Start an ssh session and at the prompt: | |||
<code><user>@prod_tms_server:~$ sudo systemctl status mqipt</code> | |||
If prompted, enter the users password; the systemd status for the MQIPT service will be output to the terminal: | |||
[[File:Mqipt systemd status.png]] | |||
Where the output indicated anything other than 'active (running)', the administrator should attempt to restart the service: | |||
<code><user>@prod_tms_server:~$ sudo systemctl restart mqipt</code> | |||
Followed by a further check of the status to confirm the result of the restart: | |||
<code><user>@prod_tms_server:~$ sudo systemctl status mqipt</code> | |||
<b> Note: </b> Where the status shows as <b> active </b> but still not receiving messages then also try a restart. |
Latest revision as of 15:42, 30 July 2020
This page provides details on the tools and interfaces provided to monitor and diagnose issues with the Gateway Server.
Note: In PuTTY the host name is gateway.tms-uk-rail.co.uk
Glances / top[edit]
These tools are provided to visually display the performance and load of the operating system and to view running services. Administrators should be cognisant of:
- High CPU load and any associated warnings;
- High memory usage and any associated warnings - including swap file usage;
- High file system usage;
- Running processes, or processes not running that should be (note: a script is provided for this purpose).
Glances[edit]
Click here to learn more about Glances.
There are 2 ways to access Glances on the Gateway Server
- Method 1: Access via the Web Interface, or
- Method 2: Start an ssh session and at the prompt:
<user>@prod_tms_server:~$ glances
.
To quit Glances, click Ctrl + C .
top[edit]
Click here to learn more about top.
To access top, start an ssh session and at the prompt: <user>@prod_tms_server:~$ top
.
What to look for[edit]
Top Left: The CPU and MEM percentages should not be too high, e.g. over 90%.
Top Right: The MEM, SWAP, LOAD figures should be low numbers.
Bottom Left: If there was an issue it gets logged to this FILE SYS section so the numbers here would be high and not in the green.
If the CPU values are high, consider draining the forecast. For more information see Forecasting.
Application Logs[edit]
The Gateway Server provides comprehensive logs for all running services. The logging level is set to provide debug and error information and administrators therefore can expect that in some cases, the logs are quite large despite log rotation in operation.
Tip: Sort by last modified to see recent entries. The log(s) will display an ERROR if there is an issue.
Remember: DARWIN goes down at 2am every day for approximately 10 minutes.
RabbitMQ Interface[edit]
Without doubt, the RabbitMQ broker is the main component of the Gateway Server.
An interface is provided for users to monitor:
- Overview of all queued messages and message rates;
- Active connections including network message rate;
- Active channels and there connected services, including message rates;
- All live exchanges, including inbound/outbound message rates;
- Active message queues.
Note: The value of using the RabbitMQ interface to diagnose issues cannot be underestimated:
- Where for example, a user has reported that the forecast service is not updating or the TD interface is not updating, RabbitMQ should always be the first port of call; It will be able to demonstrate if the problem is with the external services or within acumen by simply checking if the messages are being received and being consumed by acumen application servers.
What to look for[edit]
Check the Exchanges tab and click on the relevant feed to see graphs of the in/out message rate. This indicates whether a feed has been lost.
Note: The time frame for these graphs can be changed by clicking the link above the graph.
Where an Administrator has identified that there is an issue with the TD.net inbound feeds, then an attempt should be made to diagnose the issue and re-start the MQIPT service as per Gateway server#MQIPT.
If a feed has been lost, contact the responsible person as per Administration#Reporting Issues.
Acumen Service Management Script[edit]
For convenience, a script is provided that guides administrators to the status of vital services on the Gateway Server. To access the script, start an ssh session and at the prompt:
<user>@prod_tms_server:~$ cd /var/www/server_tasks
<user>@prod_tms_server:/var/www/server_tasks$ ./service_script.sh
Administrators should observe the following (click enter to reveal each line in turn):
Administrators should note that all referenced services are configured as system services and are setup to restart on failure therefore it would be unusual to observe any particular service down however, an administrator should attempt to start a service and consult the relevant logs if a service is observed to be down.
Also, selecting X against each service entry shows the systemd logs for the relevant service; this would be useful to diagnose any issues that the operating system has had in starting and running the service. Type q to release.
MQIPT[edit]
This is the component of the gateway Server that is responsible for maintaining the connection to TD.net, you can learn more about what MQIPT is by clicking here.
Where an Administrator has identified that there is an issue with the TD.net inbound feeds, then an attempt should be made to diagnose the issue and re-start the MQIPT service.
Diagnosis - Application Logs[edit]
Click on http://77.68.28.115/logs/mqipt.log to open the MQIPT application logs in your browser.
Administrators should look for an error or exception messages; where these are the final log entries, this would indicate an issue with the MQIPT service.
Diagnosis - systemd status[edit]
Start an ssh session and at the prompt:
<user>@prod_tms_server:~$ sudo systemctl status mqipt
If prompted, enter the users password; the systemd status for the MQIPT service will be output to the terminal:
Where the output indicated anything other than 'active (running)', the administrator should attempt to restart the service:
<user>@prod_tms_server:~$ sudo systemctl restart mqipt
Followed by a further check of the status to confirm the result of the restart:
<user>@prod_tms_server:~$ sudo systemctl status mqipt
Note: Where the status shows as active but still not receiving messages then also try a restart.