Posts

Showing posts from June, 2023

APACHE-HIVE INSTALLATION ON UBUNTU

Image
 Apache-Hive Installation On top Of  Hadoop Note : Before that installation you should installed Hadoop step 1: https://downloads.apache.org/hive/hive-3.1.2/   this is download link for apache-hive-3.1.2 through that link you can download the hive. if you any other latest version use the link  -   https://hive.apache.org/general/downloads/   now you should start the hadoop step 2 :  Once you download hive, extract the hive file and set hadoop environmental hive path in you bashrc file : like.. command for open bashrc file "gedit .bashrc" add path in your bashrc file.. export HIVE_HOME=<your hive path .......> export HIVE_CONF_DIR=<you hive conf path.....> excute the command "source .bashrc" step 3: Install mysql, it is  need for hive.. Note : hive has a inbuilt rdbms derby command for install mysql-server "sudo apt-get install mysql-server" if you want login mysql the command  "sudo mysql -u root -p" step 4: download mysql co...

APACHE-SPARK INSTALLATION ON UBUNTU

Image
 SPARK INSTALLATION ON UBUNTU strep 1: download java using the below link https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html and extract the tar file using the below command  tar -zxvf jdk-8u202-linux-x64.tar.gz (Extract the tar file) step 2: spark download click the below link here is the direct download link            https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz   in the link you can see the latest spark version 3.4.0 you can click and download. (or)if you want to download other versions clickbelow the link you can latest and old versions https://spark.apache.org/downloads.html extract the tar file using the below command: tar -zxvf <your spark path...>    exmaple... tar -zxvf /home/user/spark-3.4.0-bin-hadoop3.tgz step 3: open the bashrc file using the command " gedit .bashrc " and set your spark and java path  export JAVA_HOME=/home/user/jdk1.8.0_202 export SPA...

Hadoop Single Node Installation On Ubuntu

WHAT IS BIG DATA Big data refers to extremely large and complex data sets that cannot be effectively managed, processed, or analyzed using traditional data processing applications. It encompasses the four V's: volume, velocity, variety, and veracity. Volume: Big data refers to a vast amount of data generated from various sources, such as social media, sensors, transactions, and more. It typically exceeds the capacity of traditional database systems. Velocity: Big data is generated at high speed and often in real-time. Data is continuously produced, collected, and processed rapidly, requiring efficient and timely analysis. Variety: Big data includes various types of data, such as structured, unstructured, and semi-structured data. Structured data refers to organized data in a fixed format, while unstructured data is more flexible, including text, images, videos, social media posts, and more. Veracity: Big data can have issues with accuracy, reliability, and trustworthiness. Veraci...