About 50 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format compared …

    Apr 24, 2016 · 30,36,2 Parquet files are most commonly compressed with the Snappy compression algorithm. Snappy compressed files are splittable and quick to inflate. Big data systems want to …

  2. How to read a Parquet file into Pandas DataFrame?

    How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data …

  3. How to view Apache Parquet file in Windows? - Stack Overflow

    Jun 19, 2018 · 99 What is Apache Parquet? Apache Parquet is a binary file format that stores data in a columnar fashion. Data inside a Parquet file is similar to an RDBMS style table where you have …

  4. Inspect Parquet from command line - Stack Overflow

    How do I inspect the content of a Parquet file from the command line? The only option I see now is $ hadoop fs -get my-path local-file $ parquet-tools head local-file | less I would like to avoid

  5. What is actually meant when referring to parquet row-group size?

    Jul 27, 2023 · Parquet readers should be capable of applying skips at page-level resolution. Disclaimer: I work on arrow-c++/pyarrow. The pyarrow datasets reader's performance is very dependent on row …

  6. Merge small parquet files into a single large parquet file

    Feb 16, 2023 · I have been trying to merge small parquet files each with 10 k rows and for each set the number of small files will be 60-100. So resulting into around 600k rows minimum in the merged …

  7. python - How to read a list of parquet files from S3 as a pandas ...

    import pyarrow.parquet as pq dataset = pq.ParquetDataset('parquet/') table = dataset.read() df = table.to_pandas() Both work like a charm. Now I want to achieve the same remotely with files stored …

  8. Parquet vs Delta format in Azure Data Lake Gen 2 store

    Dec 16, 2020 · Delta Lake uses versioned Parquet files to store your data in your cloud storage. Apart from the versions, Delta Lake also stores a transaction log to keep track of all the commits made to …

  9. Convert csv to parquet file using python - Stack Overflow

    May 30, 2018 · I am trying to convert a .csv file to a .parquet file. The csv file (Temp.csv) has the following format 1,Jon,Doe,Denver I am using the following python code to convert it into parquet …

  10. What file extension is the correct way to name parquet files?

    Jun 3, 2025 · <file-name>.parquet : 1) This is the standard and most widely accepted naming convention. 2) The compression codec is stored in the Parquet file metadata, not in the filename. 3) …