Backends¶
Ralph supports various storage and database backends that can be accessed to read from or write learning events to. Implemented backends are listed below along with their configuration parameters. If your favourite backend is missing to the list, feel free to submit your implementation or get in touch!
Key concepts¶
We distinguish storage from database backends as the semantic and concepts are quite different in terms of code, but those two types of backends can be considered as “backends” from a Ralph user perspective as the CLI can use both in most of its commands.
Each backend has its own parameters that are required to use it. Those
parameters can be set as command line options or environment variables; the
later is the recommended solution for sensible data such as service
credentials. If we consider the os_username
(OpenStack user name) parameter
of the OpenStack Swift backend, it can be set as a command line option using
swift
as the option prefix (and replacing underscores in its name by dashes):
$ ralph list --backend swift --swift-os-username johndoe # [...] more options
Alternatively, this parameter can be set as an environment variable (in upper
case, prefixed by the program name, e.g. RALPH_
):
$ export RALPH_BACKENDS__STORAGE__SWIFT__OS_USERNAME="johndoe"
$ ralph list --backend swift # [...] more options
The general patterns for backend parameters are:
--{{ backend_name }}-{{ parameter | underscore_to_dash }}
for command options, and,RALPH_BACKENDS__{{ backend_type | uppercase }}__{{ backend_name | uppercase }}__{{ parameter | uppercase }}
for environment variables, where thebackend_type
is one ofDATABASE
,STORAGE
andSTREAM
.
Storage backends¶
OVH - Log Data Platform (LDP)¶
LDP is a nice service built by OVH on top of Graylog to follow, analyse and store your logs. Learning events (aka tracking logs) can be stored in GELF format using this backend.
For now the LDP backend is read-only as we consider that it is mostly used to collect primary logs and not as a Ralph target. Feel free to get in touch to prove us wrong, or better: submit your proposal for the
write
method implementation.
Backend parameters¶
To access OVH’s LDP API, you need to register Ralph as an authorized application and generate an application key, an application secret and a consumer key.
While filling the registration form available at:
eu.api.ovh.com/createToken/, be sure to
give an appropriate validity time span to your token and allow only GET
requests on the /dbaas/logs/*
path.
endpoint
: the API endpoint (e.g.ovh-eu
)application_key
: use generated application keyapplication_secret
: use generated application secretconsumer_key
: use generated consumer key
The following parameters are required to fetch archives from an LDP account stream:
service_name
: the LDP account name (e.g.ldp-xx-xxxxx
)stream_id
: the identifier of the stream you are querying (e.g. a UUID hex representation:77ec6e4a-ac15-4bcf-8043-7429bf275e49
)
For more information about OVH’s API client parameters, please refer to the project’s documentation: github.com/ovh/python-ovh.
OpenStack Swift¶
Swift is the OpenStack object storage service. This storage backend is fully supported (read and write operations) to stream and store log archives.
Backend parameters¶
Primarily required parameters correspond to a standard authentication using OpenStack Keystone service:
os_identity_api_version
: keystone API version you will authenticate to (defaults to3
)os_auth_url
: the authentication URL (defaults to OVH’s Swifthttps://auth.cloud.ovh.net/
)os_project_domain_name
: the project domain name (defaults toDefault
)os_user_domain_name
: the user domain name (defaults toDefault
)os_username
: the name of your openstack swift useros_password
: the password of your openstack swift user
Secondary parameters are required to work with the target container:
os_storage_url
: the URL of the target containeros_region_name
: the region where your container isos_tenant_name
: the name of the tenant of your containeros_tenant_id
: the identifier of the tenant of your container
Amazon S3¶
S3 is the Amazon Simple Storage Service. This storage backend is fully supported (read and write operations) to stream and store log archives.
Backend parameters¶
Primarily required parameters correspond to a standard authentication with AWS CLI:
access_key_id
: the access key for your AWS accountsecret_access_key
: the secret key for your AWS accountsession_token
: the session key for your AWS account (only needed when you are using temporary credentials).
Secondary parameters are required to work with the target bucket:
default_region
: the region where your bucket isbucket_name
: the name of your S3 bucketendpoint_url
: the endpoint of your S3 server
File system¶
The file system backend is a dummy template that can be used to develop your
own backend. It’s a “dummy” backend as it’s not required in a UNIX Shell
context, the ls
and cat
commands used along with UNIX streams will do a
better job.
Backend parameters¶
The only required parameter is the path
we want to list or stream content
from.
Stream backends¶
WebSocket¶
The webSocket backend is read-only and can be used to get real-time events.
If you use OVH’s Logs Data Platform (LDP), you can retrieve a WebSocket URI to test your data stream by following instructions from the official documentation.
Backend parameters¶
ws_uri
: the WebSocket uri (e.g.wss://example.com/websocket
)
Database backends¶
Elasticsearch¶
Elasticsearch backend is mostly used for indexation purpose (as a datalake) but it can also be used to fetch indexed data from it.
Backend parameters¶
Elasticsearch backend parameters required to connect to a cluster are:
hosts
: a list of cluster hosts to connect to (e.g.["http://elasticsearch-node:9200"]
)index
: the elasticsearch index where to get/put documentsclient_options
: a comma separated key=value list of Elasticsearch client options
The Elasticsearch client options supported in Ralph are:
- ca_certs
: the path to the CA certificate file.
- verify_certs
: enable or disable the certificate verification. Note that it should be enabled in production. Default to True
MongoDB¶
MongoDB backend is mostly used for indexation purpose (as a datalake) but it can also be used to fetch collections of documents from it.
Backend parameters¶
MongoDB backend parameters required to connect to a cluster are:
connection_uri
: the connection URI to connect to (e.g.["mongodb://mongo:27017/"]
)database
: the database to connect tocollection
: the collection to get/put objects toclient_options
: a comma separated key=value list of MongoDB client options
The MongoDB client options supported in Ralph are:
- document_class
: default class to use for documents returned from queries
- tz_aware
: if True, datetime instances returned as values in a document will be timezone aware (otherwise they will be naive)
ClickHouse¶
The ClickHouse backend can be used as a data lake and to fetch collections of documents from it.
Backend parameters¶
ClickHouse parameters required to connect are:
- host
: the connection hostname to connect to (e.g. "clickhouse.yourhost.com"
)
- port
: the port to the ClickHouse HTTPS interface (e.g. 8123
)
- database
: the name of the database to connect to
- event_table_name
: the name of the table to write statements to
- client_options
: a comma separated key=value list of ClickHouse client options
Secondary parameters are needed if not using the default ClickHouse user:
- username
: the username to connect as
- password
: the password for the given ClickHouse username
By default, the following client options are set, if you override the default
client options you must also set these:
- "date_time_input_format": "best_effort"
allows RFC date parsing
- "allow_experimental_object_type": 1
allows the JSON data type we use to store statements
The ClickHouse client options supported in Ralph can be found in these locations: - Python driver specific - General ClickHouse client settings