Posts

APACHE -CASSANDRA INSTALLATION ON UBUNTU

Image
  Apache -Cassandra Cassandra is a NoSQL database which is distributed and scalable. It is provided by Apache Software Foundation. Apache Cassandra is highly scalable, high performance, distributed NoSQL database. Cassandra is designed to handle huge amount of data across many commodity servers, providing high availability without a single point of failure. Cassandra has a distributed architecture which is capable to handle a huge amount of data. Data is placed on different machines with more than one replication factor to attain a high availability without a single point of failure. Step 1: Before cassandra you have to need install java1.8>=  https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html step 2: python is needed for  install cassandra  Install Python 2.7 sudo apt-add-repository universe sudo apt update sudo apt install python2-minimal python2 -V Step 3: Download cassandra 3.11.15  https://cassandra.apache.org/_/download.ht...

Data Migration Using Apache Sqoop

 Data Migration Using Apache Sqoop  Sqoop: sql + hadoop = sqoop actually sqoop is one of the component in hadoop and it was initially developed and maintained by Cloudera. sqoop is a data pipeline tool used to transfer the data between RDBMS(Relational Database Management System) to Hadoop. It can be a any rdbms data. There are three ways to import from rdbms to hadoop 1) RDBMS to HDFS 2) RDBMS to Hive 3) RDBMS to Hbase we can do export to rdbms from hive and hdfs  only  1)HDFS to RDBMS  2)Hive to RDBMS Here we use the Mysql(Rdbms) commands for import    1) MySql to HDFS bin/sqoop import --connect jdbc:mysql://localhost/db -username root -password 123 --table Persons -m1 2)Mysql to Hive bin/sqoop import --connect jdbc:mysql://localhost/db -username root -password 123 --table test --hive-table mysqltohive --create-hive-table --hive-import -m1 3)Mysql to Hbase bin/sqoop import --connect jdbc:mysql://localhost/test --username root --password root --table...

APACHE-HIVE INSTALLATION ON UBUNTU

Image
 Apache-Hive Installation On top Of  Hadoop Note : Before that installation you should installed Hadoop step 1: https://downloads.apache.org/hive/hive-3.1.2/   this is download link for apache-hive-3.1.2 through that link you can download the hive. if you any other latest version use the link  -   https://hive.apache.org/general/downloads/   now you should start the hadoop step 2 :  Once you download hive, extract the hive file and set hadoop environmental hive path in you bashrc file : like.. command for open bashrc file "gedit .bashrc" add path in your bashrc file.. export HIVE_HOME=<your hive path .......> export HIVE_CONF_DIR=<you hive conf path.....> excute the command "source .bashrc" step 3: Install mysql, it is  need for hive.. Note : hive has a inbuilt rdbms derby command for install mysql-server "sudo apt-get install mysql-server" if you want login mysql the command  "sudo mysql -u root -p" step 4: download mysql co...

APACHE-SPARK INSTALLATION ON UBUNTU

Image
 SPARK INSTALLATION ON UBUNTU strep 1: download java using the below link https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html and extract the tar file using the below command  tar -zxvf jdk-8u202-linux-x64.tar.gz (Extract the tar file) step 2: spark download click the below link here is the direct download link            https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz   in the link you can see the latest spark version 3.4.0 you can click and download. (or)if you want to download other versions clickbelow the link you can latest and old versions https://spark.apache.org/downloads.html extract the tar file using the below command: tar -zxvf <your spark path...>    exmaple... tar -zxvf /home/user/spark-3.4.0-bin-hadoop3.tgz step 3: open the bashrc file using the command " gedit .bashrc " and set your spark and java path  export JAVA_HOME=/home/user/jdk1.8.0_202 export SPA...

Hadoop Single Node Installation On Ubuntu

WHAT IS BIG DATA Big data refers to extremely large and complex data sets that cannot be effectively managed, processed, or analyzed using traditional data processing applications. It encompasses the four V's: volume, velocity, variety, and veracity. Volume: Big data refers to a vast amount of data generated from various sources, such as social media, sensors, transactions, and more. It typically exceeds the capacity of traditional database systems. Velocity: Big data is generated at high speed and often in real-time. Data is continuously produced, collected, and processed rapidly, requiring efficient and timely analysis. Variety: Big data includes various types of data, such as structured, unstructured, and semi-structured data. Structured data refers to organized data in a fixed format, while unstructured data is more flexible, including text, images, videos, social media posts, and more. Veracity: Big data can have issues with accuracy, reliability, and trustworthiness. Veraci...