Tech:Graylog

Graylog is a log management solution for logs stored on the servers. As part of of a central logging project, Miraheze is exploring Graylog for the central log storage. The web interface is available at https://graylog.miraheze.org/. Access is restricted to Technical Team members. Said people can use their LDAP credentials for authentication. The point of contact for this service is User:Southparkfan.

Architecture
Graylog runs on graylog1.miraheze.org as of now. There are three daemons running there:  for the actual log management,   for storing the logs and   for storing Graylog's configuration.

+--+                                       +--+                     | test2.miraheze.org               |                                        | graylog1.miraheze.org                    | | ++                  |                                        |                                          |                     | |            |                   |                                        | +---+      +---+ | ++   | | MediaWiki  |-\                 |                                        | |               |      |               | | +--\-+  | +-+   |            | |   --/                          |         |          |                     |            --\  | |             |   | syslog-ng  -/          TLS encrypted              |          \         \                     | |  NGINX     -            | |                                        |  +---|---+  \  +---+ | | |            |   ++ |                                        |  |               |   \ |               | |                     | +-+    /             |                                        |  |     NGINX     |    ||    mongod     | | |                  /              |                                        |  |               |     |               | |                     | +-+  /               |                                        |  +--|+     +---+ |                     | | /dev/log    | /                |                                        +-|+ | | (kernel logs|/                |                                                  |                                                      | |, etc.)     |                  |                                                  | | +-+                 |                                                  |                                                      |                                  |                                                  |                                                      +--+                                        +-|-+                                                                                                                        |                   |                                                                                                                        |  Tech Team member | |                  |                                                                                                                        +---+
 * |  | |            |  ---\             |                                    --|graylog-server  elasticsearch | |
 * Miraheze User |   | ++      --\          |             12210/tcp   --/    | |               |\     |               | |

In the example above, test2 runs syslog-ng, which is responsible for receiving the logs locally and sending them to graylog-server. By setting  to 'syslog_ng' in puppet, base::syslog will install syslog-ng and configure it to listen on   (for anything on the server sending its logs to that destination, such as MediaWiki and NGINX) and 'system' for services such as ssh and kernel logs.

Streams
Streams are Graylog's categories of data. By default, the  stream is the stream for every message sent to Graylog. Streams are useful to limit access for certain members. For example, MediaWiki Administrators can only access the streams for MediaWiki and NGINX logs.

Quering the data
Graylog has a search syntax that's close to Lucene's syntax. For MediaWiki and NGINX, custom fields have been defined: go to https://graylog.miraheze.org/search and click on 'Fields' on your left. Using these fields, you can query the data. For example:


 * View NGINX logs for your IP address:
 * View all SSH logs:
 * View all MediaWiki errors and warnings:

Administration
So far, it looks like configuring Graylog is a combination of Puppet usage and using the web interface for configuration (where configuration will eventually be stored in MongoDB on graylog1.miraheze.org). role::graylog is used for graylog1's configuration. base::syslog contains the configuration for every server logging to Graylog.

Future

 * Building a graylog2 node for replication/redundancy
 * Applying authentication mechanisms + encryption to elasticsearch for Grafana <-> Elasticsearch integration