Bộ lọc

Tìm kiếm gần đây của tôi
Lọc theo:
Ngân sách
đến
đến
đến
Loại
Nhiều kỹ năng
Ngôn ngữ
    Tình trạng công việc
    1,026 pyspark rdd công việc được tìm thấy

    I need a Pyspark code that reads join rules from a txt file and applies them in the main program. The joins will be based on multiple columns, with the rules formatted as column names with conditions. The code should only implement inner joins. Ideal candidates should have comprehensive experience in Pyspark, understand how to manipulate data frames, and can write clean, efficient code.

    $13 Average bid
    $13 Giá đặt trung bình
    6 lượt đặt giá

    I'm seeking an expert in SQL Server and automation coding to create a script for row-by-row data comparison testing. Key Requirements: - Expertise in SQL Server - Strong coding skills for automation - Experience with row-by-row data comparison python,sql, pyspark, databricks The data is located in database tables and is in need of thorough testing. The ideal freelancer for this project will have a strong understanding of SQL Server and experience in developing automation scripts for data comparison.

    $44 Average bid
    $44 Giá đặt trung bình
    27 lượt đặt giá

    I'm seeking assistance to land a Data Engineering role. I specifically need help with technical interview questions and getting comfortable with addressing technical questions. Help me get job ready. - Technical Questions: Focus primarily on technical interview questions. - Mock Interviews: Provide simulat...assistance to land a Data Engineering role. I specifically need help with technical interview questions and getting comfortable with addressing technical questions. Help me get job ready. - Technical Questions: Focus primarily on technical interview questions. - Mock Interviews: Provide simulated technical interviews to build my confidence and skills. - Areas of Focus: SQL and databases, Azure and PySpark. Please reach out if you have experience in these areas and can as...

    $6 / hr Average bid
    $6 / hr Giá đặt trung bình
    21 lượt đặt giá
    Redshift to PySpark Conversion
    Đã kết thúc left

    I'm looking for a professional to help me convert my Redshift scripts into PySpark. The goal is to transition this code to a cloud-based platform, and I need the new PySpark scripts to replicate certain data transformations. Key Responsibilities: - Convert existing Redshift scripts into PySpark - Ensure all necessary data transformations are replicated, specifically joins and window functions Ideal Skills: - Extensive experience with both Redshift and PySpark - Proficient in data transformations, particularly joins and window functions - Prior experience with cloud-based platforms I'm open to discussing the primary goal of this project further, as this question was skipped in our initial considerations. However, I'm anticipating a focus on perf...

    $15 Average bid
    $15 Giá đặt trung bình
    6 lượt đặt giá

    Seeking an AWS Glue expert to assist with fetching analytical data from Google BigQuery and storing it on S3 in Parquet or CSV format. The job includes setting up an incremental data extraction process that runs daily. Current Status: Query: Already prepared. Connection: The connection to BigQuery is set up and ready from within Glue Studio. Challenge: 1- I need assistan...date-partitioned tables in BigQuery and load data from there incrementally. 2- configure S3 crawler to scan the bucket and push new data to DB Background: I previously implemented this workflow using QlikView script, but I am now transitioning to AWS. Looking for guidance on best practices specific to AWS. Ideal candidates should have: - Extensive experience with AWS Glue, AWS Glue NoteBook (PySpark) and S...

    $131 Average bid
    $131 Giá đặt trung bình
    63 lượt đặt giá

    I'm looking for a freelance resume writer who specializes in tech resumes, particularly for data engineering positions in the UK. My goal is to create a compelling resume that highlights my skills and experiences effectively for mid-level roles. Key Skills to Highlight: - Proficiency in Python and SQL - Experience with Cloud platforms like Azure - Familiarity with Databricks, PySpark, PowerBI, Snowflake, DBT cloud, Azure Data Factory, Synapse Analytics, ETL, CI/CD, Azure DevOPs, C# .Net Most Recent Job Experience: - Data Engineering The ideal candidate for this project should have: - Proven experience in writing tech resumes, particularly for data engineering roles - In-depth understanding of the data engineering field - Ability to articulate and present technical skills and...

    $15 Average bid
    $15 Giá đặt trung bình
    11 lượt đặt giá

    I'm looking for a professional skilled in Pyspark, SQL, and Python for a comprehensive data analysis project. The main focus will be interpreting and extracting insights from our datasets. Ideal Skills and Experience: - Proficient in Pyspark for large scale data processing. - Strong SQL skills for database management and queries. - Advanced Python capabilities, particularly in data analysis libraries such as Pandas and NumPy. - Experience with data visualization tools (e.g., Matplotlib, Seaborn) to present findings. - Prior work in data analysis is highly preferred.

    $17 Average bid
    $17 Giá đặt trung bình
    27 lượt đặt giá

    I'm seeking a professional skilled in Pyspark and SQL to assist with data transformation tasks. Your primary role will be to modify and prepare data for analysis. This may involve cleaning, filtering, aggregating, summarizing or joining multiple datasets. Ideal candidates should have: - Proficiency in Pyspark and SQL - Experience in data transformation - Strong analytical skills - Ability to work independently and meet deadlines.

    $16 Average bid
    $16 Giá đặt trung bình
    17 lượt đặt giá

    To build a pipeline where we obtain data from a table in hadoop server and do some quality checks before updating that data to a postgress table. After that we need to filter that postgress data and do an upsert command and update that particular data into an s3 bucket. The data pipeline involves multiple data sources. The pipeline should include data validation to ensure accuracy and consistency. Ensure data quality checks such as checking the count of rows, comparing the data before and after, and collecting the new data inserted before updating the data to the Postgres table." Please use Apache Airflow for orchestration. The pipeline should run daily. Ensure the pipeline includes advanced data validation like integrity checks and statistical analysis.

    $30 Average bid
    $30 Giá đặt trung bình
    3 lượt đặt giá

    As a senior developer in our team, you will be instrumental in both web application development and da...provide users with a personalized and efficient way of viewing key metrics and performance indicators. - Role-based Access Control to ensure that sensitive information is accessible only to authorized users. - Data Export Capabilities to allow users to easily export data in various formats for further analysis or reporting. Ideal Skills and Experience: - Extensive experience in Django, Python, and PySpark - Proven track record in web application development - Expertise in creating data processing solutions - Experience in developing custom business tools The project needs to be completed in a medium-term time frame. The developer will be highly involved in the design and planni...

    $27 / hr Average bid
    $27 / hr Giá đặt trung bình
    27 lượt đặt giá

    I'm looking for a professional with substantial experience in setting up Azure Databricks. The main goal of this project is to establish a simple, yet efficient Databricks setup on Azure. A single node is sufficien...substantial experience in setting up Azure Databricks. The main goal of this project is to establish a simple, yet efficient Databricks setup on Azure. A single node is sufficient for this task. Once the setup is complete, I need you to run a sample program using PySpark. This will validate the setup and ensure everything is functioning correctly. The primary purpose of this Databricks setup will be focused on data processing and ETL. Ideal skills for this job include: - Extensive knowledge in Azure and Databricks - Proficiency in PySpark - Experience in d...

    $215 Average bid
    $215 Giá đặt trung bình
    22 lượt đặt giá

    I'm looking for an experienced Databricks and PySpark developer to build a simple function that can retrieve data from a csv file in SharePoint and load it into a Databricks DataFrame. The function should take parameters such as SharePoint path, file name, and format, and return a DataFrame with the loaded data. Key Requirements: - The connection between Azure Databricks and the SharePoint site must be configured correctly. - Configuration of security, secrets, network settings, and/or service principles will be necessary. - The function and its configurations must work seamlessly in my corporate environment. - All configurations should utilize Service Principals for security or Oauth. Network Settings: - The function should be compatible with my current use of a Virtual ...

    $108 Average bid
    $108 Giá đặt trung bình
    13 lượt đặt giá

    I'm currently engaged in a data engineering project and I need assistance with data transformation and ETL tasks. A significant portion of this project involves building and designing Directed Acyclic Graphs (DAGs) in Apache Airflow. Ideal Skills: - Proficiency in Python and Pyspark - Extensive experience with AWS services, particularly Glue, Athena, and S3 - Expertise in workflow automation using Airflow - Strong understanding of data transformation and ETL processes The selected freelancer will play a crucial role in ensuring the smooth operation of my project. Your expertise will help facilitate efficient data processing and workflow automation.

    $670 Average bid
    $670 Giá đặt trung bình
    23 lượt đặt giá

    I'm seeking a seasoned Data Engineer with over 7 years' experience, who can manage and govern our data using Unity Catalog. The engineer will need to seamlessly integrate their work with our fully built-out data architecture. Ideal Candidate Should Have: - Strong expertise in Azure Data Factory (ADF), Azure Databricks, and PySpark. - Proficient in SQL, Azure DevOps (ADO), GIT, and has a basic understanding of PowerBI. - Over 2 years' practical experience with Unity Catalog.

    $30 / hr Average bid
    $30 / hr Giá đặt trung bình
    13 lượt đặt giá

    Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, SQL, python, aws, Foundry Functions, ...

    $6 / hr Average bid
    $6 / hr Giá đặt trung bình
    10 lượt đặt giá

    Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work fro...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, Databricks, snowfla...

    $7 / hr Average bid
    $7 / hr Giá đặt trung bình
    9 lượt đặt giá

    Hi, This is a job support role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the sec...role. Mostly you will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 400 /hr You will do an initial connect to get the work understanding. Billing will be started when you feel you are comfortable with the work from the second session. Please confirm. Required skills- Pyspark, SQL, p...

    $5 / hr Average bid
    $5 / hr Giá đặt trung bình
    12 lượt đặt giá

    I'm seeking a PySpark specialist to assist with data processing and ETL tasks. The primary focus will be on optimizing existing scripts to enhance performance and efficiency. Ideal Skills and Experience: - Proficient in PySpark with extensive experience in data processing and ETL - Strong background in script optimization - Familiarity with data handling from SQL Databases, Cloud Storage, and CSV/Excel files - Excellent problem-solving skills and attention to detail

    $100 Average bid
    $100 Giá đặt trung bình
    14 lượt đặt giá
    Data Engineer / Data Architect
    Đã kết thúc left

    ...Architect / Data Engineer Trainer, you will be responsible for delivering engaging and informative training sessions on a variety of data engineering topics. The ideal candidate has over 15 years of experience and is eager to share their expertise to help others grow in this rapidly evolving field. Key Responsibilities: Develop and deliver training sessions on the following topics: Apache Spark PySpark FastAPI Spark NLP Databricks or Snowflake Integrations with cloud platforms (AWS, GCP) Data virtualization (Starburst) Data modeling (Apache Iceberg, Parquet, JSON) Data Lakehouse architecture (Spark and Flink) Apache Airflow Oracle GoldenGate, Informatica Flask Framework, Docker, Kubernetes Pandas Control-M for scheduling MLOps in Data Engineering and Machine Learning Models Da...

    $29 / hr Average bid
    $29 / hr Giá đặt trung bình
    5 lượt đặt giá
    Pyspark install and setup
    Đã kết thúc left

    hi i am hiring you for the task we discussed on freelancer.com calls.

    $12 / hr Average bid
    $12 / hr Giá đặt trung bình
    1 lượt đặt giá

    ...know you can do it on my machine too. Before Installation: run the following command in your terminal pyspark --version it shud show an error confirming that the pyspark is not already installed and configured. (hence, you cannot just show me a video of a google colab with pyspark working) After Installation: run the following command in your terminal pyspark --version it will show the spark version 3.5.3 confirming that the pyspark is installed. also, run the following piece of 2-line code: from import SparkSession spark = .appName("PySpark Example") .getOrCreate() it shud run without any errors confirming that the pyspark is configured correctly. if you show me a video like this on your mach...

    $103 Average bid
    $103 Giá đặt trung bình
    15 lượt đặt giá

    I need a freelancer to develop a binary classification model that predicts card brands (Visa or MasterCard) based on data from the global BIN database. The model should utilize BIN6 (the first six digits of the card number) alongside other relevant features such as country, terminal, and affiliate for accurate classification. Key Requirements: - Proficiency in Spark, PySpark, and Docker - Extensive experience in machine learning model development - Ability to evaluate model performance based on accuracy - Strong proficiency in Python for data analysis and model development. The dataset provided is clean and ready for use without any need for preprocessing. The ideal candidate will be able to leverage the specified frameworks to create an efficient and effective prediction model. ...

    $158 Average bid
    $158 Giá đặt trung bình
    27 lượt đặt giá
    Apache Spark Assignment
    Đã kết thúc left

    ...task is to be solved using Spark Ethical Practices Please submit original code only. All solutions must be submitted through the portal. We will perform a plagiarism check on the code and you will be penalized if your code is found to be plagiarized. Software/Language to be used Python 3.10.x Apache Spark v3.5.3 Apache Kafka v3.7.1 Additionally, the following Python libraries are required: pyspark == 3.5.3 kafka-python == 2.0.2 No other libraries are allowed. Include the following shebang at the top of your Python scripts. #!/usr/bin/env python3 Convert your files to executable using the following command chmod +x *.py Convert line breaks in DOS format to Unix format (this is necessary if you are coding on Windows without which your code will not run on our portal). dos2uni...

    $12 Average bid
    $12 Giá đặt trung bình
    1 lượt đặt giá

    *Job Title: Big Data Developer* *Job Description:* We are seeking a skilled Big Data Developer to join our team and work on generating daily job recommendations for our platform. The ideal candidate will have experience with big data technologies and cloud services, particularly within the Oracle Cloud ecosystem. *Responsibilities:* - Develop and optimize PySpark applications for processing large datasets. - Implement machine learning models using PyTorch for recommendation systems. - Manage data storage and processing using Oracle Cloud Infrastructure (OCI) services. - Load processed data into Elasticsearch and MySQL for efficient retrieval. - Automate workflows using scheduling tools like Apache Airflow. - Monitor and improve job performance and resource utilization. *Technolo...

    $11 / hr Average bid
    $11 / hr Giá đặt trung bình
    10 lượt đặt giá

    Implement a robust data warehousing solution on Snowflake to enable data-driven decision-making. Integrate data from SAP and other systems into a centralized Snowflake environment which act as a self-service analytics layer to Empower business users to access and analyze data independently. • Developed Data Ingestion Pipelines from SAP to Snowflake tables using Talend. • Creat...requirements. • Deliver tailored data feeds in various formats to support different downstream applications and services. • Designed and Implemented Glue CI/CD pipeline using GitLab. • Migrate existing Talend jobs to AWS Glue for potential cost savings and improved performance. • Developed Snowflake streamlit app for data sharing with business users. Environment: AWS Glue, Snowflake,...

    $1314 Average bid
    $1314 Giá đặt trung bình
    15 lượt đặt giá

    I'm seeking an expert in data analysis using PySpark on AWS. The primary goal is to analyze a large amount of structured data. Key Responsibilities: - Analyze the provided structured data and generate outputs in the given format. - Build classification machine learning models based on the insights from the data. - Utilize PySpark on AWS for data processing and analysis. Ideal Skills: - Proficiency in PySpark and AWS. - Strong experience in analyzing large datasets. - Expertise in building classification machine learning models. - Ability to generate outputs in a specified format.

    $110 Average bid
    $110 Giá đặt trung bình
    6 lượt đặt giá
    hdfs file to rdd
    Đã kết thúc left

    check details

    $100 Average bid
    $100 Giá đặt trung bình
    1 lượt đặt giá

    I am seeking a seasoned data scientist and PySpark expert to develop a logistic regression model from scratch for text data classification using public datasets. Key Requirements: - Build a logistic regression model from scratch(do not use libraries for regression) to classify text data into categories. - Use of Python and PySpark is a must. - Experience with handling and analyzing text data is essential. The model's primary goal will be to classify the data into categories. The successful freelancer will be provided with detailed specifications and project requirements upon awarding. Please, only apply if you have substantial experience in creating logistic regression models and are comfortable working with text data and public datasets.

    $153 Average bid
    $153 Giá đặt trung bình
    17 lượt đặt giá

    I am seeking expert-level training in the following technologies, from basics to advanced concepts: Power BI Azure Cloud Services Microsoft Fabric Azure Synapse Analytics SQL Python PySpark The goal is to gain comprehensive knowledge and hands-on experience with these tools, focusing on their practical application in data engineering. If you or your organization provide in-depth training programs covering these tech stacks, please reach out with course details, duration, and pricing. Looking forward to hearing from experienced professionals!

    $71 Average bid
    $71 Giá đặt trung bình
    12 lượt đặt giá

    ...handling, grouping, sorting, and imputation of data, as well as implementation of advanced data bucketing strategies. The project also requires robust error-handling mechanisms, including the ability to track progress and resume operations after a crash or interruption without duplicating previously processed data. Requirements: Expertise in Python, especially libraries like Pandas, Dask, or PySpark for parallel processing. Experience with time-series data processing and geospatial data. Proficiency in working with large datasets (several gigabytes to terabytes). Knowledge of efficient I/O operations with CSV/Parquet formats. Experience with error recovery and progress tracking in data pipelines. Ability to write clean, optimized, and scalable code. Please provide examples of ...

    $18 - $149
    Niêm phong
    $18 - $149
    7 lượt đặt giá

    I'm in need of a professional with extensive PySpark unit testing experience. I have PySpark code that loads data from Oracle ODS to ECS (S3 bucket). The goal is to write unit test cases that will achieve at least 80% coverage in SonarQube quality. You will focus primarily on testing for: - Data validation and integrity - Error handling and exceptions The ideal candidate should: - Be proficient in using PyTest, as this is our preferred testing framework - Have a comprehensive understanding of PySpark - Be able to deliver immediately Please note, the main focus of this project is not on the data transformations that the PySpark code performs (which includes data cleaning and filtering, data aggregation and summarization, as well as data joining and mergin...

    $98 Average bid
    $98 Giá đặt trung bình
    11 lượt đặt giá

    I'm seeking a professional with extensive experience in PySpark and ETL processes. The project involves migrating my current ETL job, which sources data from a PySpark database and targets a Data Lake. Key tasks include: - Designing and implementing the necessary PySpark code - Ensuring data is effectively transformed through cleaning, validation, aggregation, summarization, merging and joining. Ideal candidates will have a deep understanding of both PySpark and data transformations. Your expertise will be crucial to successfully migrate this ETL job.

    $39 Average bid
    $39 Giá đặt trung bình
    11 lượt đặt giá

    I'm seeking a skilled PySpark expert to assist with data analysis and transformations on structured data. The task involves: - Utilizing PySpark to manipulate and analyze big data. - Writing efficient PySpark code to handle the task. Ideal candidates should have extensive experience with PySpark and a strong background in data analysis and transformations. Proficiency in working with structured data from sources like CSV files, SQL tables, and Excel files is crucial.

    $163 Average bid
    $163 Giá đặt trung bình
    37 lượt đặt giá

    I am looking for a professional to design a series of training videos for beginners on Python, SQL, Pyspark, ADF, Azure Data Bricks, and Snowflake. The primary goal of these videos is to teach the fundamental principles and techniques associated with each of these technologies. As such, the curriculum for each technology will need to be developed from scratch, ensuring that it covers all the necessary topics in a clear and engaging manner. Key responsibilities include: - Developing a detailed curriculum for each technology - Creating high-quality video content - Providing thorough explanations in PDF format - Incorporating our logo into each video Ideal candidates should have: - A strong background in IT, with a focus on the technologies listed - A proven track record in creat...

    $22 / hr Average bid
    $22 / hr Giá đặt trung bình
    21 lượt đặt giá

    I am looking for a professional to design a series of training videos for beginners on Python, SQL, Pyspark, ADF, Azure Data Bricks, and Snowflake. The primary goal of these videos is to teach the fundamental principles and techniques associated with each of these technologies. As such, the curriculum for each technology will need to be developed from scratch, ensuring that it covers all the necessary topics in a clear and engaging manner. Key responsibilities include: - Developing a detailed curriculum for each technology - Creating high-quality video content - Providing thorough explanations in PDF format - Incorporating our logo into each video Ideal candidates should have: - A strong background in IT, with a focus on the technologies listed - A proven track record in creat...

    $665 Average bid
    $665 Giá đặt trung bình
    21 lượt đặt giá

    I need an AWS Glue job written in PySpark. The primary purpose of this job is transforming data stored in my S3 bucket Ideal Skills: - Proficient in PySpark and AWS Glue - Experience with data transformation and handling S3 bucket data Your bid should showcase your relevant experience and approach to this project.

    $22 / hr Average bid
    $22 / hr Giá đặt trung bình
    94 lượt đặt giá

    I have a PySpark code that requires optimization primarily for performance. Key requirements: - Enhancing code performance to handle large datasets efficiently. - The code currently interacts with data stored in Azure Data Lake Storage (ADLS). - Skills in PySpark, performance tuning, and experience with ADLS are essential. - Understanding of memory management in large dataset contexts is crucial. Your expertise will help improve the code's efficiency and ensure it can handle larger datasets without performance issues.

    $98 Average bid
    $98 Giá đặt trung bình
    16 lượt đặt giá

    ...candidate will have hands-on experience in architecting cloud data solutions and a strong background in data management and integration. If you are passionate about working with cutting-edge technologies and have a proven track record in the Azure ecosystem, We would love to hear from you! Key Responsibilities : -Cloud data solutions using Azure Synapse, Databricks, AzureDataFactory, DBT Python, PySpark, and SQL -Set up ETL Pipelines in Azuredatafactory -Set up data model in Azure data bricks / Synapse -Design and manage cloud data lakes, data warehousing solutions, and data models. -Develop and maintain data integration processes. -Collaborate with cross-functional teams to ensure alignment and successful project delivery. Qualifications : -Good understanding of data warehousing...

    $24 / hr Average bid
    $24 / hr Giá đặt trung bình
    23 lượt đặt giá

    I'm looking for a professional who's proficient in AWS Glue, S3, Redshift, Azure Data Bricks, PySpark, and SQL. The project entails working on data transformation and integration, data analysis and processing, database optimization, infrastructure setup and management, continuous data processing, and query optimization. The expected data volume is classified as medium, ranging from 1GB to 10GB. Ideal Skills and Experience: - Strong experience in AWS Glue, S3, and Redshift - Proficiency in Azure Data Bricks, PySpark, and SQL - Proven track record with data transformation and integration - Expertise in database optimization and query optimization - Experience with managing and setting up infrastructure for data processing - Ability to handle continuous data processi...

    $311 Average bid
    $311 Giá đặt trung bình
    11 lượt đặt giá

    I am looking for a data engineer to help me build data engineering pip...build data engineering pipelines in Microsoft Fabric using the Medallion Architecture. The primary goal of these pipelines is to perform ELT (Extract, Load, Transform). Key Responsibilities: - Design and implement data engineering pipelines via Microsoft Fabric. - Utilize the Medallion Architecture to optimize data flow and processing. - Creating separate workspaces for each layer and lakehouse - Pyspark to write jobs Ideal Skills and Experience: - Extensive experience with Microsoft Fabric. - Strong understanding and experience with ELT processes. - Familiarity with Medallion Architecture. - Able to work with both structured data and Json. - understand how to connect and work across workspaces and lakehouses...

    $14 / hr Average bid
    Gấp
    $14 / hr Giá đặt trung bình
    22 lượt đặt giá

    I'm looking for a seasoned Databricks professional to assist with a data engineering project focused on the migration of structured data from cloud storage. Key Responsibilities: - Lead the migration of structured data from our cloud storage to the target environment - Utilize Pyspark for efficient data handling - Implement DevOps practices for smooth and automated processes Ideal Skills: - Extensive experience with Databricks - Proficiency in Pyspark - Strong understanding of DevOps methodologies - Prior experience in data migration projects - Ability to work with structured data Please, only apply if you meet these criteria, and can provide examples of similar projects you have successfully completed.

    $76 Average bid
    $76 Giá đặt trung bình
    7 lượt đặt giá

    ...and implement a data processing pipeline on Azure - Ensure the pipeline is capable of handling structured data, particularly from SQL databases - Optimize the pipeline for reliability, scalability, and performance Ideal Skills and Experience: - Extensive experience with Azure cloud services, particularly in a data engineering context - Proficiency in data processing tools such as Scala-Spark, Pyspark - Strong understanding of Unix/Linux systems and SQL - Prior experience working with Data Warehousing, Data Lake, and Hive systems - Proven track record in developing complex data processing pipelines - Excellent problem-solving skills and ability to find innovative solutions to data processing challenges This role is suited to a freelancer who is not only a technical expert in clo...

    $20 / hr Average bid
    $20 / hr Giá đặt trung bình
    29 lượt đặt giá

    I'm in search of a Azure Data Factory expert who is well-versed in Delta tables, Parquet, and Dedicated SQL pool. As per the requirement, I have all the data and specifications ready. The successful freelancer will need to be familiar with advanced transformations as the ETL complex...expert who is well-versed in Delta tables, Parquet, and Dedicated SQL pool. As per the requirement, I have all the data and specifications ready. The successful freelancer will need to be familiar with advanced transformations as the ETL complexity level is high. It's a plus if you have prior and proven experience in handling such projects. Key Skills Required: - Expertise in Azure Data Factory -pyspark - Deep knowledge of Delta tables, Parquet and Dedicated SQL pool - Familiarity with adv...

    $531 Average bid
    $531 Giá đặt trung bình
    50 lượt đặt giá

    I'm looking for a talented Pyspark Developer who has experience in working with large datasets and is well-versed in PySpark above version 3.0. The primary task involves creating user-defined function code in PySpark for applying cosine similarity on two text columns. Key Requirements: - Handling large datasets (more than 1GB) efficiently - Proficient in PySpark (above version 3.0) - Experienced in implementing cosine similarity - Background in health care data is a plus Your primary responsibilities will include: - Writing efficient and scalable code - Applying cosine similarity on two text columns - Ensuring the code can handle large datasets This project is a great opportunity for a Pyspark Developer to showcase their skills in handling big da...

    $85 Average bid
    $85 Giá đặt trung bình
    9 lượt đặt giá

    I require a highly skilled AWS data engineer who can provide on-demand consultation for my data processing needs. The project involves helping me manage large volumes of data in AWS using Python, SQL, Pyspark, Glue, and Lambda. This is a long-term hourly consulting job, where I will reach out to you when I need guidance on any of the following areas: - Data Ingestion: The initial process of collecting and importing large volumes of data into AWS. - Data Transformation: The process of converting and reformatting data to make it suitable for analysis and reporting. - Data Warehousing: The ongoing management and storage of transformed data for analysis purposes. Your role will be to assist me in making critical decisions about data architecture and processing, using the tools and lan...

    $15 / hr Average bid
    $15 / hr Giá đặt trung bình
    54 lượt đặt giá

    I'm on a quest for an expert in big data, specifically in areas of data storage, processing, and query optimization. The ideal candidate would be required to: _ Need someone who is experienced in PySpark - Manage the storage and processing of my large datasets efficiently. Foremost in this requirement is a dynamic understanding of big data principles as relates to data storage and processing. - Kick in with your expertise in PostgreSQL by optimizing queries for improved performance and efficiency in accessing stored data. - Using Apache Hive, you'll be tasked with data summarization, query, and in-depth analysis. This entails transforming raw data into an understandable format and performing relevant calculations and interpretations that enable insightful decisions. Skil...

    $14 Average bid
    $14 Giá đặt trung bình
    4 lượt đặt giá

    I'm looking for an expert PySpark developer to help manage and process big data sets on AWS. The successful candidate will have a strong knowledge of key AWS services such as S3, Lambda, and EMR. Ingest the data from source CSV file to target delta tables Tasks include: - Building and managing large scale data processes in PySpark - Understanding and using AWS services like S3, Lambda and EMR - Implementing algorithms for data computation Ideally, you'll have: - Expertise in PySpark development - In-depth knowledge of AWS services, specifically S3, Lambda and EMR - Proven experience in handling and processing big data - A problem-solving approach with excellent attention to detail. Your experience should allow you to hit the ground running on this data p...

    $93 Average bid
    $93 Giá đặt trung bình
    13 lượt đặt giá
    Full Stack PYTHON Developer
    Đã kết thúc left

    Full Stack PYTHON Developer within a growing tech start-up organisation focused on transforming the professional services industry. Our stack includes Python, React, JavaScript, TypeScript, GraphQL, Pandas, NumPy, PySpark and many other exciting technologies so plenty of scope to grow your skills. We're looking for someone experienced with Python, React, Material UI, Redux, Service Workers, Fast API, Django, Flask, Git and Azure. We'd also like this person to have a proven track record working as a fullstack developer or similar role with strong problem solving skills, attention to detail and initiative to get things done. Fully remote team working across the globe but with a fantastic team culture. This is not a project role but an open ended requirement so you really...

    $550 Average bid
    $550 Giá đặt trung bình
    186 lượt đặt giá

    ...using Azure Data Factory (ADF). Optimize data transformation processes using PySpark. Production experience delivering CI/CD pipelines across Azure and vendor products. Contribute to the design and development of enterprise standards. Key knowledge of architectural patterns across code and infrastructure development. Requirements: Technical Skills and Experience: Bachelor’s or master’s degree in computer science, Engineering, Data Science, or equivalent experience, with a preference for experience and a proven track record in advanced, innovative environments. 7-8 years of professional experience in data engineering. Strong expertise in Microsoft Azure data services, particularly Azure Data Factory (ADF) and PySpark. Experience with data pipeline design, deve...

    $10203 Average bid
    $10203 Giá đặt trung bình
    15 lượt đặt giá

    Hi, You will be working for 2 hours on a daily basis with the developer on zoom call Please confirm following - Early morning est 7 am to 9 am ist Daily 2 hours, Zoom call budget approx 500 /hr Required skills- Data Engineer/ Databricks Developer: Python, spark, pyspark, SQL Azure cloud, data factory Scala Terraform Kubernetes

    $6 / hr Average bid
    $6 / hr Giá đặt trung bình
    5 lượt đặt giá