ClickHouse is a column database developed by Yandex to process analytical queries which is ideal for solving digital marketing tasks.
We have prepared this guide to facilitate the start of digital analysts’ work with ClickHouse.
Server purchase for ClickHouse setup
In order to set up ClickHouse, you need to purchase a virtual machine which operates on Ubuntu.
As an example, let us consider server purchases from DigitalOcean.
After registration select Create Droplets:
Ubuntu as an operation system.
Amsterdam as a server location.
Specify the size of a virtual machine: 3 GB Memory, 1 vCPU and 60 GB SSD.
With a gradual increase in the amount of data you will need a more productive server. DigitalOcean supports server scaling which allows to easily enhance technical characteristics.
After a virtual machine is created you get an email with the SSH access ro the server.
To set up ClickHouse, you need to address a server using the SSH client.
SSH client is a software which allows managing the server via the command line.
Any application is suitable, for instance:
Launch the SSH client.
To connect to a server, specify its IP address, login, and then set up a password after authorization.
Then follow the instructions prepared by our team.
To do that, copy commands from Google Docs or sequentially insert the commands mentioned below in the SSH client console.
#1. Specify a repository from which ClickHouse will be downloaded:
1sudo apt-add-repository "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
#2. Then launch the setup process (it takes several minutes):
1sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
2sudo apt-get update
3sudo apt-get install clickhouse-client clickhouse-server
#3. Launch ClickHouse:
1sudo service clickhouse-server start
The setup is complete. To make sure it was successful, launch the console client:
After that execute a test query on ClickHouse:
If everything is okay, then the query will answer with 1. Exit the client and proceed to the final step: the ClickHouse setup
To ensure further secure and comfortable operation of ClickHouse you need to set up the following:
Open the IP connection.
Right after setup ClickHouse is closed for connection
Set up Tabix.
Working with ClickHouse is more convenient via the graphic client Tabix, which is an editor of select queries.
Create an account.
You need to create an account that you will use to access ClickHouse. By default, a password is not set up in ClickHouse.
Opening the IP connection
Server access settings are specified in the config.xml; to open it, run this command:
1sudo nano /etc/clickhouse-server/config.xml
As a result, a file with ClickHouse settings will open:
Or specify the IP address if a connection from a particular IP address is needed. For instance, <listen_host>0.0.0.0</listen_host>.
Save changes by pressing the hotkeys: ^X.
Tabix (a graphic client) setup
To set up Tabix, you only need to uncomment the tag <http_server_default_response...>, in config.xml which will upload Tabix:
Tabix is loading via the link:
Where instead of 0.0.0.0 you need to specify IP address of your server. By default, an account with login Default is created in ClickHouse. Use it to enter Tabix:
By default, ClickHouse creates a user with the name Default without a password. You can set up a password in an open or encoded way (SHA-256).
These passwords are no cure from a potential fraud. They are more like a protection from staff.
You can generate a password with this command:
1echo -n "testPasword" | sha256sum | tr -d '-'
Where you specify the necessary password instead of testPasword.
After running a command we are getting an encoded password.
To set it as your password, you need to edit the users.xml file.
To do that, run the following command:
1sudo nano /etc/clickhouse-server/users.xml
A file with the ClickHouse user settings will open:
You can also specify password in an open manner:
Do not forget to save changes. From now on, to connect to Tabix you need to specify login and password.
Data uploading to ClickHouse
Renta allows uploading data from different advertising sources, analytics systems and databases, for instance, from Google AdWords or Yandex.Direct.
A complete list of connectors and the setup process are described in the documentation.
Streaming raw Google Analytics data with an automatic merge of data from advertising sources, CRM system is also available to ClickHouse.
Connection of Power BI to ClickHouse
You can connect to ClickHouse via the odbc driver or through http.
Making it through http is more convenient.
To do that, select Web as the data source in Power BI:
the server address and port,
ClickHouse access credentials,
select query (which will return necessary data),
the extract format: TSV or CSV.
For instance, in Power BI you need to collect sessions count from advertising sources. Let us assume that the ClickHouse query looks like the following:
Then we prepare the URL which will return the query execution reault in the CSV file format.
The URL structure should look like this:
1// specify server accesses
4// Next is the sql-query:
16//specify the file format
As a final step — you need to transform the acquired query to the URL format:
To transform a query to the URL format, you can use any decoder.
Setting up an update in Power BI Online
Power BI Online requires setting a gateway to update reports in the cloud. But you cannot bypass this if you bind a domain to the server with ClickHouse.
Create a subdomain in the domain settings, and specify the server IP address as an A entry.
Setup is happening on the hosting side to which the domain is bound.
As a result, specify the domain as the data source instead of an IP address.
Gateway setup is not required when using the subdomain.
Still have questions?
If you have any questions — post the in the Comments section.
We will be happy to provide you with answers to them.