![]() If a restaurant is found, return the extracted data as a dictionary with the 'type' key set to 'Uber Eats' f. If a driver is found, return the extracted data as a dictionary with the 'type' key set to 'Uber' e. Use regular expressions to find the date, total, driver (if available), and restaurant (if available) information in the text d. Extract the text content from the page c. Open the PDF file using pdfplumber and get the first page b. Define the extract_data function that takes a PDF file path as an input: a.pandas: To work with data in DataFrame format and save to CSV.os: To interact with the operating system, e.g., working with directories and files.re: To perform regular expression operations.pdfplumber: To extract text from PDF files.Here’s an explanation of each part of the code: Import pdfplumberimport reimport osimport pandas as pddef extract_data(pdf_path): with pdfplumber.open(pdf_path) as pdf: page = pdf.pages content = page.extract_text() date_pattern = r'\d") elif extracted_data = 'Uber': uber_data.append(extracted_data) elif extracted_data = 'Uber Eats': uber_eats_data.append(extracted_data)uber_df = pd.DataFrame(uber_data)uber_eats_df = pd.DataFrame(uber_eats_data)uber_df.to_csv('uber_receipts.csv', index=False)uber_eats_df.to_csv('uber_eats_receipts.csv', index=False)Įnter fullscreen mode Exit fullscreen mode uber_ride.csv: Contains information related to Uber rides with columns: type, date, total, and driver.uber_eats.csv: Contains information related to Uber Eats orders with columns: type, date, total, and restaurant.The cleaned and structured data will be stored in two separate CSV files: ![]() This process involves tasks such as parsing dates, converting currencies, and standardizing column names. To achieve this, we can use Python libraries like PyPDF2 or pdfplumber to parse the PDF files and extract the relevant information.Īfter extracting the raw data, the next step is to clean and structure it. The first step in the My Uber Project pipeline is to extract data from the PDF receipts received via email after each Uber ride or Uber Eats order. This pipeline utilizes an ELT (Extract, Load, Transform) approach to extract data from PDF receipts, clean and structure the data, store the data in a PostgreSQL database, perform transformations using dbt (Data Build Tool), and finally visualize the results with Power BI. In this article, I will walk you through the process of building the “My Uber Project” pipeline. As a digital content creator and data engineer, I decided to create a proof-of-concept (POC) for a data analysis project to track my expenses on these platforms. Unveiling the true cost of your ride-sharing and food delivery habits with an ELT data pipeline, PostgreSQL, dbt, and Power BI.Īs a regular user of Uber and Uber Eats products, I realized that I wanted to gain better insights into how much I spend on these services per month, year, or quarter.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |