About 50 results
Open links in new tab
  1. How to transform Dask.DataFrame to pd.DataFrame?

    Aug 18, 2016 · How can I transform my resulting dask.DataFrame into pandas.DataFrame (let's say I am done with heavy lifting, and just want to apply sklearn to my aggregate result)?

  2. python - Difference between dask.distributed LocalCluster with threads ...

    Sep 2, 2019 · What is the difference between the following LocalCluster configurations for dask.distributed? Client(n_workers=4, processes=False, threads_per_worker=1) versus …

  3. Reading an SQL query into a Dask DataFrame - Stack Overflow

    May 24, 2022 · I'm trying create a function that takes an SQL SELECT query as a parameter and use dask to read its results into a dask DataFrame using the dask.read_sql_query function.

  4. How to Set Dask Dashboard Address with SLURMRunner (Jobqueue) …

    Dec 17, 2024 · I am trying to run a Dask Scheduler and Workers on a remote cluster using SLURMRunner from dask-jobqueue. I want to bind the Dask dashboard to 0.0.0.0 (so it’s accessible …

  5. Python Xarray ValueError: unrecognized chunk manager dask - Stack …

    Jun 5, 2023 · Python Xarray ValueError: unrecognized chunk manager dask - must be one of: [] Asked 2 years, 10 months ago Modified 1 year, 5 months ago Viewed 10k times

  6. dask: difference between client.persist and client.compute

    Jan 23, 2017 · More pragmatically, I recommend using persist when your result is large and needs to be spread among many computers and using compute when your result is small and you want it on just …

  7. How to specify correct dtype for column of lists when creating a dask ...

    Oct 9, 2023 · When creating a dask Dataframe with the from_pandas method, the formerly correct dtype object becomes a string[pyarrow]. import dask.dataframe as dd import pandas as pd df = …

  8. python - Why does Dask perform so slower while multiprocessing …

    Sep 6, 2019 · In your example, dask is slower than python multiprocessing, because you don't specify the scheduler, so dask uses the multithreading backend, which is the default.

  9. dask: looping over groupby groups efficiently - Stack Overflow

    Mar 25, 2025 · This approach computes the DataFrame twice, which is inefficient. Question: How can I efficiently perform a groupby operation on a Dask DataFrame without loading everything into …

  10. python - Why does dask take long time to compute regardless of the …

    Mar 24, 2022 · The reason dask dataframe is taking more time to compute (shape or any operation) is because when a compute op is called, dask tries to perform operations from the creation of the …