OS Database Technologies

Sunday, 4 October 2015

JVM Analysis [For cassandra]

Command to know inside memory utilization in jvm.

java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize|ThreadStackSize'
uintx AdaptivePermSizeWeight = 20 {product}
intx CompilerThreadStackSize = 0 {pd product}
uintx ErgoHeapSizeLimit = 0 {product}
uintx HeapSizePerGCThread = 87241520 {product}
uintx InitialHeapSize := 1057516032 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 16921919488 {product}
uintx MaxPermSize = 85983232 {pd product}
uintx PermSize = 21757952 {pd product}
intx ThreadStackSize = 1024 {pd product}
intx VMThreadStackSize = 1024 {pd product}
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

Monday, 10 November 2014

Cassandra Handy commands

Some handy commands of Cassandra:

To make vertical output from cqlsh.

Use EXPAND ON;

Example :

cqlsh:loadtest> expand on;
Now printing expanded output
cqlsh:loadtest> select * from cmdtest;
@ Row 1
------+------
id | 1
name | amit
(1 rows)

To execute Query from file

SOURCE

cqlsh> use loadtest;
cqlsh:loadtest> source '/tmp/query.sql'
id | name
----+---------
1 | amit
2 | testing
(2 rows)

CAPTURE :
Captures command output and appends it to a file.

To start capturing the output of a query, specify the path of the file relative to the current directory. Enclose the file name in single quotation marks

cqlsh:loadtest> capture '/tmp/a'
Now capturing query output to '/tmp/a'.
cqlsh:loadtest> select * from cmdtest;
cqlsh:loadtest> exit
cassandra@dell27:~> cat /tmp/a
id | name
----+---------
1 | amit
2 | testing
(2 rows)

Thursday, 18 September 2014

Cassandra os-tuning.

OS TUNING - Cassandra

Below settings should be applied to all node of Cassandra cluster.

File Descriptors : Cassandra needs more file descriptor than default settings of 1024, below is how you can change it.

#echo "* soft nofile 32768" | sudo tee -a /etc/security/limits.conf

#echo "* hard nofile 32768" | sudo tee -a /etc/security/limits.conf

#echo "root soft nofile 32768" | sudo tee -a /etc/security/limits.conf

#echo "root hard nofile 32768" | sudo tee -a /etc/security/limits.conf

User Resource Limits :

Added the following entries in /etc/security/limits.conf file.

# End of file

* soft nofile 32768

* hard nofile 32768

root soft nofile 32768

root hard nofile 32768

* soft memlock unlimited

* hard memlock unlimited

root soft memlock unlimited

root hard memlock unlimited

* soft as unlimited

* hard as unlimited

root soft as unlimited

root hard as unlimited

Run command

#sysctl -w vm.max_map_count=131072

Disable Swap - Lets not jvm gets swapped rather get it killed by oom killer. As even swapped jvm eventually will killed by oom. Better node to down then slow.

# sudo swapoff --all

Synchronize Clocks : The column will only get overwrite if recent version has latest time compare to old value.

NTP is already running on server

Tuesday, 16 September 2014

One liner linux

Linux - makes life easy.. lets inspired with Linux

To see a line from big file.

sed -n 'Xp' theFile, where X is your line number.

e.g sed -n '42401p' /tmp/bd/soted_voter2.csv

To get more lines.

sed -n '1127062,+10p' /tmp/gujrat/dhana_gujrat5

To remove a line from big file

sed -i '42401d' /tmp/bd/soted_voter2.csv

Note- If you dont put -i it will show whole output after replacement.

To get Non ascii/foreign characters of a file in red color.

grep --color='auto' -P -n "[\x80-\xFF]" /tmp/bd/soted_voter2.csv

To remove non-ascii chars from a file

tr -cd '\11\12\15\40-\176' < /tmp/vr_votersorted2.copy > /tmp/bd/cleansortedvoter.copy

To grep pattern wise.

To load data into cdb (only tab supported. )

cdbmake-12 jh jh.tmp < /tmp/jhepic1

To append some text at the end of line in a file.

awk -F '`' '{print $1 "~2"}' epicmeta_stripped.csv > /tmp/jhepic

To move 1st 15 files from one dir to another.

find . maxdepth 1 -name '*voice*' | head -n 20 | xargs mv -t /tmp/p5

time perl -pi -e 's/0000-00-00 00:00:00//;' *2014*

To configure ip

ifconfig eth0 192.168.19.200/16 up

To Add gateway via command.

`route add default gw 192.168.2.234

To check which process is eating memory.
ps -e -o pid,vsz,comm= | sort -n -k 2

Monday, 15 September 2014

Cassandra-startups

Trying to brief Cassandra : important stuff to learn to manage it.

Cassandra - In Greek mythology : A Princess who can see the future but got cursed that nobody will believe on her.

Excellent nosql technology through which can you scale the writes and nail down all BCP and Disaster recovery issues.

Some of the important feature are listed below.

Multi Data center support
Multi master mode - peer to peer communication
High Availability
Linear scalability
Collection datatypes
Faster updates
Standard Monitoring Dashboard

Out of Scope

joins
group by
order by
secondary index - not performan

Important concepts

seed-node
Gossips
peer to peer
initial_token
listen_address
rpc_address
Compaction
tombstone
Quorum
consistency
replication factor
snitch

Need to learn:

Installation

Connect to Cassandra (via cql)

create key space/table/index

How to add node

How to remove node

How to repair a cluster

How to backup c* cluster
How to setup monitoring (OPS center)

How to set replication factor
How to set consistency Level (via Driver)
How to set authenticaion in connection

Monday, 3 October 2011

Helping Query for PostgreSQL DBA

How to get column list of a table.

BEGIN;
CREATE AGGREGATE core.array_acc(anyelement) (
    SFUNC = array_append,
    STYPE = anyarray,
    INITCOND = '{}'
);

SELECT relname, array_acc(attname)
    FROM pg_attribute a join pg_class c on a.attrelid = c.oid
  WHERE relname in ('session', 'hits', 'conversion_hits')
        and attname not in ('tableoid','cmax','xmax','cmin','xmin','ctid')
GROUP BY relname;
commit;

relname   | conversion_hits
array_acc |{session_id,permanent_id,conversion_id,sale,price,iso_currency_id,order_code,description,sku,supersedes,user_defined1,user_defined2,user_defined3,user_defined4,document_url,referrer,domain_id,rdate,rtime,last_modified,client_biz_unit,client_section_name,old_price,old_currency_id}
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname   | hits
array_acc | {session_id,document_url,referrer,domain_id,rdate,rtime,last_modified,client_biz_unit,client_section_name}
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname   | sessions
array_acc | {session_id,permanent_id,entry_domain_id,host_ip,actual_ip,geo_worldregion_id,geo_country_id,geo_region_id,geo_location_id,campaign_id,account,category,item,traffic_source_id,keyword,paid_keyword,browser_id,browser_string,os_id,locale,language,returning_visitor,rdate,rtime,referrer,entry_url,latency_days,device_id,device_type_id,match_type_id,search_type_id,b2a_keyword_id,last_modified,redirect_id,media_published_site,media_creative_id,media_landing_page_id,client_section_name,client_biz_unit,track_url,search_engine_result_page,search_engine_position,ad_copy_id}

How to extract schema difference in pgSQL DB

There is a interesting tool available in postgresql to get the differences of various schema of seperate databas. The name of tool is "apgdiff", can be download from this web-link.

Its a java based tool. You need to provide it 2 argument which is schema file location for which you want to get the difference.

Pre-requiste

1. Java installed on your server where apgdiff needs to be run.

2. You need to have both schema file, one would be original where changes needs to be apply, second schema file which is already having latest change.

1st file would be the original schema where you want to make changes
2nd file would be the schema file location where latest changes exist.

Step by step example :

1. Create two database on test machine.

[pgsql@## ~]$ createdb original;
[pgsql@## ~]$ createdb dev;

2. Create a test table at both above defined databases.

[pgsql@## ~]$ psql -Upgsql original -c "create table public.test(id integer)"
CREATE TABLE
[pgsql@## ~]$ psql -Upgsql dev -c "create table public.test(id integer)"
CREATE TABLE

3. Modify the schema at dev databases.
[pgsql@## ~]$ psql -Upgsql dev -c "ALTER TABLE public.test ADD COLUMN name text;"
ALTER TABLE

4. Dump the schema of both database for comparision.
[pgsql@##]$ pg_dump -Upgsql -s original -f original_schema.dump
[pgsql@##]$ pg_dump -Upgsql -s dev -f dev_schema.dump

5. Run the apgdiff tool to get the DIFF sql in below format.
[pgsql@##]$ cd apgdiff-2.3/

[pgsql@i2adb20 apgdiff-2.3]$ java -jar apgdiff-2.3.jar original_schema.dump dev_schema.dump

ALTER TABLE test
ADD COLUMN name text;

[pgsql@##]$

Catch : This tool only extract the schema difference not the permission one.