Hi @abhinav,
Requesting you to install ‘parquet-tools’ package in CloudxLab. It will help students like us to learn about Parquet file format.
Hi @abhinav,
Requesting you to install ‘parquet-tools’ package in CloudxLab. It will help students like us to learn about Parquet file format.
Hi @raviteja,
Yes, I will install it globally
In the meanwhile, please follow this link to configure it locally in your home directory in web console
Hope this helps
Thanks
Thanks @abhinav for prompt respose,
I have already tried it installing locally & facing below issue:
Step 1: Cloned Parquet repository & tried to installed locally using maven
git clone https://github.com/Parquet/parquet-mr.git
cd parquet-mr/parquet-tools/
mvn clean package -Plocal
Below is the issue i’m facing:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.091s
[INFO] Finished at: Mon Dec 25 09:01:12 UTC 2017
[INFO] Final Memory: 11M/234M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project parquet-tools: Could not resolve dependencies for project org.apache.parquet:parquet-tools:jar:1.9.1-SNAPSHOT: Failed to collect dependencies for [org.apache.parquet:parquet-format:jar:2.4.0 (compile), org.apache.parquet:parquet-hadoop:jar:1.9.1-SNAPSHOT (compile), org.apache.hadoop:hadoop-client:jar:2.7.3 (compile), commons-cli:commons-cli:jar:1.3.1 (compile), com.google.guava:guava:jar:20.0 (compile), org.slf4j:slf4j-log4j12:jar:1.7.22 (compile), junit:junit:jar:4.12 (test), org.easymock:easymock:jar:3.4 (test), commons-httpclient:commons-httpclient:jar:3.1 (test)]: Failed to read artifact descriptor for org.apache.parquet:parquet-hadoop:jar:1.9.1-SNAPSHOT: Could not transfer artifact org.apache.parquet:parquet-hadoop:pom:1.9.1-SNAPSHOT from/to jitpack.io (https://jitpack.io): peer not authenticated -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Please help me to resolve.
Hi @abhinav,
Above issue has been resolved, below are the proper steps needs to be followed for local installation of ‘parquet-tools’
Java::
$wget https://github.com/apache/parquet-mr/archive/apache-parquet-1.8.2.tar.gz
$cd parquet-tools && mvn clean package -Plocal
$java -jar parquet-tools-1.8.2.jar schema sample.parquet
Note: Git repository clone is not stable & has few build issues, so downloaded stable release & built from local.
Now i can start working with parquet files.
Python::
Please follow below steps for working with Python instead of Java for parquet files:
$virtualenv parquet-tools
$source parquet-tools/bin/activate
$pip install parquet
$parquet --metadata test.parquet
$parquet test.parquet
Note that, 'parquet' command directly works only after activating virtualenv.