# My server setup notes



## W1zzard (Aug 9, 2014)

Dumping them here, in case some random internet person finds them via Google:

This is for CentOS 7 + Docker + GlusterFS + Pacemaker, we are running all our other services inside Docker containers that are managed via Pacemaker


```
centos.mirror.constant.com/7/os/x86_64/

yum -y remove audit iprutils i*firmware libertas-*-firmware
rpm -e alsa-tools-firmware alsa-firmware aic94xx-firmware fxload
rpm -e postfix

rpm -i http://mirror.de.leaseweb.net/epel/beta/7/x86_64/epel-release-7-0.2.noarch.rpm
yum -y update
yum -y install chrony tar telnet mc nano wget psmisc sysstat iftop iotop screen bind-utils net-tools xfsprogs traceroute tcpdump rsync mysql bash-completion php-cli iptraf hdparm strace
yum -y install docker kvm qemu-kvm libvirt virt-clone pacemaker pcs
systemctl enable docker
echo DOCKER_OPTS="-r=false" > /etc/sysconfig/docker

sed -i -e"s/SELINUX=enforcing$/SELINUX=disabled/" /etc/selinux/config

echo "net.ipv4.conf.all.arp_ignore=1" >> /etc/sysctl.conf
echo "net.ipv4.ip_nonlocal_bind=1" >> /etc/sysctl.conf
echo "net.netfilter.nf_conntrack_max=10000000" >> /etc/sysctl.conf
echo "net.netfilter.nf_conntrack_tcp_timeout_established=7875" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog=65535" >> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range=1024 65535" >> /etc/sysctl.conf

echo "/swapfile none swap defaults 0 0" >> /etc/fstab
dd if=/dev/zero of=/swapfile bs=1M count=1024
chmod 600 /swapfile
mkswap /swapfile
swapon -a

echo "password" | passwd --stdin hacluster

yum -y install iptables-services
cat <<EOF > /etc/sysconfig/iptables
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT

# Always allow internal traffic
-A INPUT -i br0 -j ACCEPT
-A INPUT -i eth0 -j ACCEPT

# Docker images
-A INPUT -i br1 -m conntrack --ctstate NEW -d tpuadsrv-vip-ext -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -i br1 -m conntrack --ctstate NEW -d tpuwww-vip-ext -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -i br1 -m conntrack --ctstate NEW -d tpucdn-vip-ext -m tcp -p tcp --dport 80 -j ACCEPT

# This host
-A INPUT -i br1 -m conntrack --ctstate NEW -d 108.61.17.98 -m tcp -p tcp --dport 22 -j ACCEPT

-A INPUT -i br1 -j REJECT --reject-with icmp-host-prohibited

# Need ACCEPT for virtual interfaces
-A INPUT -j ACCEPT

-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
EOF
systemctl enable iptables
yum -C -y remove firewalld --setopt="clean_requirements_on_remove=1"

yum -C -y remove authconfig --setopt="clean_requirements_on_remove=1"

yum -y install exim
perl -i -pe 'BEGIN{undef $/;} s/(daemon_smtp_ports =)/local_interfaces = 127.0.0.1.25\n$1/smg' /etc/exim/exim.conf
perl -i -pe 'BEGIN{undef $/;} s/(begin routers\s+).*?(begin)/$1tpumail:\n  driver = manualroute\n  transport = remote_msa\n  route_list = * mail.techpowerup.com\n\n$2/smg' /etc/exim/exim.conf
perl -i -pe 'BEGIN{undef $/;} s/(begin authenticators\s+)(.*?begin)/$1tpumail_login:\n  driver = plaintext\n  public_name = LOGIN\n  hide client_send = : servers\@techpowerup.com : password\n\n$2/smg' /etc/exim/exim.conf
chmod 600 /etc/exim/exim.conf

nano /etc/default/grub
remove rhgb quiet
add consoleblank=0 net.ifnames=0
grub2-mkconfig -o /boot/grub2/grub.cfg

yum -y autoremove NetworkManager

yum -y install rsyslog
cat <<END > /etc/rsyslog.conf
\$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
\$ModLoad imjournal # provides access to the systemd journal
\$ModLoad imklog  # provides kernel logging support (previously done by rklogd)

\$WorkDirectory /var/lib/rsyslog
\$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

\$OmitLocalLogging on

\$IMJournalStateFile imjournal.state

\$ActionQueueFileName fwdRule1 # unique name prefix for spool files
\$ActionQueueMaxDiskSpace 1g  # 1gb space limit (use as much as possible)
\$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
\$ActionQueueType LinkedList  # run asynchronously
\$ActionResumeRetryCount -1  # infinite retries if host is down

*.* @@logserver-vip
END
systemctl start rsyslog
systemctl enable rsyslog

mkdir -p /var/log/journal

systemctl enable dnsmasq
systemctl start dnsmasq

# setup network interfaces

# reboot
# remove old kernel

scp 10.0.2.0:/root/.ssh/authorized_keys ~/.ssh/authorized_keys
scp 10.0.2.0:/root/.ssh/id_rsa ~/.ssh/id_rsa
scp 10.0.2.0:/root/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub

scp 10.0.2.0:/etc/hosts /etc/hosts

rm -rf /etc/audit/ /etc/firewalld/ /etc/NetworkManager/ /var/lib/NetworkManager/ /var/log/audit/ /var/log/messages /var/log/maillog /var/lib/postfix/ /var/spool/postfix/

scp node2:/etc/corosync/authkey /etc/corosync/authkey
scp node2:/etc/corosync/corosync.conf /etc/corosync/corosync.conf

systemctl enable corosync pacemaker pcsd
systemctl restart corosync pacemaker pcsd

pcs cluster auth

pcs cluster setup cluster node1 node2

cd /etc/yum.repos.d/
wget http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo

yum -y install glusterfs-server attr
systemctl enable glusterd
systemctl start glusterd

## replace glusterfs node
on other node: grep node3 /var/lib/glusterd/peers/*

echo UUID=1d4bbd3c-85e2-4661-b41d-4db27ad7633b>/var/lib/glusterd/glusterd.info
systemctl stop glusterd
gluster peer status
gluster peer probe node1
gluster volume sync node1
systemctl restart glusterfsd

## new node

mkfs.xfs /dev/sdb1
mkdir /mnt/ssd
echo "/dev/sdb1 /mnt/ssd xfs noatime,discard 1 2" >> /etc/fstab
mount -a

mkfs.xfs -i size=512 /dev/sda3
mkdir /mnt/sda3
echo "/dev/sda3 /mnt/sda3 xfs defaults 1 2" >> /etc/fstab
mount -a
mkdir /mnt/brick/gv0


mkdir /storage
echo "localhost:/gv0 /storage glusterfs defaults,_netdev 0 0" >> /etc/fstab
mount -a

/bin/cp /storage/dockerfiles/docker-enter /usr/local/sbin

gluster volume create gv0 replica 3 node1:/mnt/sda3/gv0 node2:/mnt/sda3/gv0 node3:/mnt/sda3/gv0
gluster volume start gv0

setfattr -x trusted.glusterfs.volume-id /mnt/sda3/gv0
setfattr -x trusted.gfid /mnt/sda3/gv0
rm -rf /mnt/sda3/gv0/.glusterfs

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip= 108.61.17.99 cidr_netmask=32 op monitor interval=30s

pcs resource create ocf:heartbeat:IPaddr2 ip= 108.61.17.99 cidr_netmask=32 op monitor interval=30s
```


----------



## silentbogo (Aug 10, 2014)

Thx, W1zz!


----------



## Easy Rhino (Aug 10, 2014)

i was going to ask why not run gluster natively on centos and then read that there is no support for centos 7 and they provide a docker for it. crazy days we live in.


----------



## W1zzard (Aug 10, 2014)

Easy Rhino said:


> i was going to ask why not run gluster natively on centos and then read that there is no support for centos 7 and they provide a docker for it. crazy days we live in.


uhm? we are running glusterfs natively on our servers, on centos7

repo is here: http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/

glusterfs works extremely well and is super robust. really love it. like all cluster filesystems it's slow, especially for web loads (small files), so avoid extra stat() calls by using php opcache with opcache.revalidate_freq and put temporary files on local hdd/ssd/tmpfs


----------



## Easy Rhino (Aug 10, 2014)

W1zzard said:


> uhm? we are running glusterfs natively on our servers, on centos7
> 
> repo is here: http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/
> 
> glusterfs works extremely well and is super robust. really love it. like all cluster filesystems it's slow, especially for web loads (small files), so avoid extra stat() calls by using php opcache with opcache.revalidate_freq and put temporary files on local hdd/ssd/tmpfs



oh i see. i read other people just using docker to install and run glusterfs rather than a third party repo.


----------



## W1zzard (Aug 10, 2014)

since gluster has to be up 24/7 and on all our servers i chose to not put it inside docker

Our docker containers:


----------



## Easy Rhino (Aug 10, 2014)

W1zzard said:


> since gluster has to be up 24/7 and on all our servers i chose to not put it inside docker
> 
> Our docker containers:



that is just pure win


----------



## W1zzard (Aug 12, 2014)

Note to self: no matter how often you do the dry run and think you got your method right. Always double and triple check the results.

In the final move I forgot to convert our databases to InnoDB, so no Galera replication happened, when I rebooted the primary DB node earlier today, another node took over, which never saw any DB updates since Saturday...


----------



## Easy Rhino (Aug 13, 2014)

W1zzard said:


> Note to self: no matter how often you do the dry run and think you got your method right. Always double and triple check the results.
> 
> In the final move I forgot to convert our databases to InnoDB, so no Galera replication happened, when I rebooted the primary DB node earlier today, another node took over, which never saw any DB updates since Saturday...



doh! i always write down (copy/paste) the commands i use so when i do it in production there is no question.


----------



## W1zzard (Aug 13, 2014)

So did I, except for the database move, which was a bit tricky .. 
1. dump the whole db
2. fix the dump to not overwrite the mysql table
3. fix the dump to create innodb instead of mysql
4. load the dump
5. sync all db servers

somehow i forgot to do step 3 in the final run (did it in all test-runs)


----------



## W1zzard (Aug 14, 2014)

Note to self: don't change a MEMORY table to InnoDB to get it replicated while dozens of inserts and deletes are running on it


```
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] mysqld: Can't find record in 'session_log'
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table techpowerup_ads.session_log; Can't find record in 'session_log', Error_code: 1032; handler error HA_ERR
_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1094, Internal MariaDB error code: 1032
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 120, 10694450
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: Failed to apply app buffer: seqno: 10694450, status: 1
Aug 14 21:15:09 node1 mysqld: #011 at galera/src/trx_handle.cpp:apply():340
Aug 14 21:15:09 node1 mysqld: Retrying 2th time
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] mysqld: Can't find record in 'session_log'
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table techpowerup_ads.session_log; Can't find record in 'session_log', Error_code: 1032; handler error HA_ERR
_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1094, Internal MariaDB error code: 1032
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 120, 10694450
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: Failed to apply app buffer: seqno: 10694450, status: 1
Aug 14 21:15:09 node1 mysqld: #011 at galera/src/trx_handle.cpp:apply():340
Aug 14 21:15:09 node1 mysqld: Retrying 3th time
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] mysqld: Can't find record in 'session_log'
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table techpowerup_ads.session_log; Can't find record in 'session_log', Error_code: 1032; handler error HA_ERR
_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1094, Internal MariaDB error code: 1032
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 120, 10694450
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: Failed to apply app buffer: seqno: 10694450, status: 1
Aug 14 21:15:09 node1 mysqld: #011 at galera/src/trx_handle.cpp:apply():340
Aug 14 21:15:09 node1 mysqld: Retrying 4th time
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] mysqld: Can't find record in 'session_log'
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table techpowerup_ads.session_log; Can't find record in 'session_log', Error_code: 1032; handler error HA_ERR
_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1094, Internal MariaDB error code: 1032
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 120, 10694450
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] WSREP: Failed to apply trx: source: 5508b395-23c8-11e4-9945-bf10217e983b version: 3 local: 0 state: APPLYING flags: 1 conn_id: 1080867 trx_id: 51654636 seqnos (l:
 1102251, g: 10694450, s: 10694449, d: 10694356, ts: 17218294333149)
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] WSREP: Failed to apply trx 10694450 4 times
Aug 14 21:15:09 node1 mysqld: 140814 21:15:00 [ERROR] WSREP: Node consistency compromized, aborting...
```

kabooom!


----------



## VulkanBros (Aug 14, 2014)

Hmmm...maybe a stupid question - This Gluster file system - are you using that for  gathering all types of internal storage (NAS, HDD, whatever) 
or can it also be used to gather cloud based or abroad storage?


----------



## W1zzard (Aug 14, 2014)

No, it's just to share files locally, basically to replace NFS (which is kinda impossible to scale to multiple active servers)

http://blog.gluster.org/category/geo-replication/
It can do geo replication, not sure how well that works and how slow it is


----------



## VulkanBros (Aug 14, 2014)

Ups oh i see - locally - for internal dev.

EDIT: From http://www.gluster.org/documentation/About_Gluster/

"GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!)" 

Jesus - 72 brontobytes - it has more zero´s in it, than I have hair.......


----------



## W1zzard (Aug 14, 2014)

we use it to share the web data like php scripts, images, but everything has a caching layer in front because glusterfs is quite slow

Edit: GlusterFS is incredibly robust and its self-heal works better than anything I've ever seen.


----------



## VulkanBros (Aug 14, 2014)

_But why use GlusterFS if it is slow?? Because of scalability / robustness / cost? Why not use  NFS or SAN? _


----------



## W1zzard (Aug 14, 2014)

NFS doesn't work with multiple write-active servers, SAN is too expensive and even slower.

For large file sequential, GlusterFS works really well and is as fast as your network or local storage (just tested 100 MB/s from HDD, so just same as local). The problem are small files like web scripts


----------



## VulkanBros (Aug 14, 2014)

Have you tried FreeNAS? The ZFS filesytem is very robust and on the right hardware an cofigured right it is very fast (~100 MB/Sec). 
We are using it for offloading our VMware backups.


----------



## W1zzard (Aug 14, 2014)

VulkanBros said:


> Have you tried FreeNAS? The ZFS filesytem is very robust and on the right hardware an cofigured right it is very fast (~100 MB/Sec).
> We are using it for offloading our VMware backups.


ZFS is no distributed filesystem as far as I know. Also no ZFS for Linux (unless hacked in)

GlusterFS is the best solution for our use case. What happens if you pull the plug of your ZFS server? With GlusterFS the other GlusterFS servers in the cluster will just continue working, the clients will never notice that a plug was ever pulled, they can continue reading and writing. Once the stopped machine comes back up, it will rejoin the cluster, self-heal and magically just work.

For backups, ZFS is a good choice, how often do you scrub your disks? Using deduplication? Online of offline dedup?


----------



## VulkanBros (Aug 14, 2014)

Availability is the key - okay, and yes ZFS is not a distributed filesystem.

We scrub the volumes every 7 days (due to our production cycle)

We have tested deduplication, but found it to resource intensive. The newest ZFS system compresses very well so we use that instead of dedupe.


----------



## W1zzard (Aug 14, 2014)

they have DKMS packages for ZFS now?! yay. No use for those new web servers, but our EU file backup box could definitely use it.

Oh, and if you have to backup lots of very similar small files each day, look into using rsnapshot. It's low-tech but works extremely well to conserve space. We've been using it for over 2 years in production.


----------



## VulkanBros (Aug 15, 2014)

Thanks - testing rsnapshot right now!  Another possibility is to use OwnCloud in conjunction with FreeNAS......


----------



## W1zzard (Aug 18, 2014)

Infrastructure changed to have a www frontend (haproxy) which forwards traffic internally between clones for faster failover in case of node failure. preliminary tests suggest 0-2 seconds .. the frontend can move freely between nodes in case it fails on node1.

fear my army of clones


----------

