Data Scientists sometimes alternate between using Pyspark and Pandas dataframes depending on the use case and the size of data being analysed. It can sometimes get confusing and hard to remember the syntax for processing each type of dataframe. The following cheat sheet provides a side by side comparison of Pandas and Pyspark syntax needed to accomplish some common programming tasks.
|
AuthorMy name is Vanessa Afolabi also known as @TheSASMom. I am a Data Scientist fluent in SAS, R, Python and SQL with a passion for Machine Learning and Research. Archives
August 2019
Categories |