This coursework is an individual assignment. You must implement it independently.
This coursework is a mini project which applies machine leaning techniques to analyse a real-
world data set. You are asked to implement several machine learning tasks using the data from
"Stack Overflow 2024 Annual Developer Survey". This survey will help you understand the state
of software developers, you can read more about the survey here: Stack Overflow Developer
Survey 2024. The dataset can be downloaded from the stack overflow CDN and imported into
[login to view URL]
Kaggle:[login to view URL]
The following tasks are required in the coursework.
(1) Implement exploratory data analysis and gain an understanding of the data set and
its features. These concepts will be covered as part of the module content.
2) Implement cluster analysis and understand the characteristics of developers in each
cluster.
3) Implement classification and build machine learning models for predicting whether
a developer is in high income (compensation) based on developers’ information.
Note: for developers in this data set, low compensation is defined as annual
compensation less than $55,000, otherwise, as high compensation. The
compensation is provided in different denominations based on the developer’s
location. However, the converted value in USD is in column DI: ConvertedCompYearly
4)Implement a regression model that predicts the salary of a person given some
attributes. Compare your result with the actual salary and ensure that the prediction
error is as minimal as possible.
A report with 15 pages is recommended. The report in total, however, must not exceed 20
pages (excluding title page, contents page, references, and appendices) with the font Arial and
size 10 in the main text. A penalty of a single grade will be incurred if you exceed the 20-page
limit. You may put extra information in appendices which is not counted in the 20-page limit.
You are asked to write the report with the provided report template at the end of the template.
It is recommended to cite and list referees using Harvard Referencing style (see
[login to view URL]). However, other (author, year)
styles like APA are also accepted.
By the submission deadline, you are expected to submit both your report (in MS Word or PDF
format) and your Python source code (in *.ipynb or *.py format) to NOW Dropbox.
Your work will be assessed according to the assessment criteria provided in Section II.
The remainder of this specification provides you with detailed requirements for each area of
content – you should read it very carefully.
1. Introduction
State the coursework tasks and state the insight you intend to gain in the coursework.
Introduce the CRoss Industry Standard Process for Data Mining (CRISP-DM)
methodology. Explain its application and importance with appropriate reference to
the literature.
Discuss how you are applying CRISP-DM to the project in your coursework.
2. Data Understanding, Data Preprocessing, Exploratory Data Analysis
Describe the background information of the data, such as how the data are collected
and what is the purpose of the data.
There are many columns in the data set. Select appropriate features from the columns
for your analysis with justification. Obviously, some features have a bigger impact on
the compensation than others. You will decide by yourself which features should be
adopted in your project.
Describe the selected features such as (though not limited to) their name, description,
and data type. For numeric attributes, provide descriptive statistics. It is sufficient to
describe only those attributes used in your analysis. Select at least 3 or more features
and plus the target variable Compensation. For a Distinction coursework, it is expected
that at least seven predictors will be selected, including both numeric and categorical
features.
Describe the quality of the data set, such as (though not limited to) determination of
the number of all flawed instances, which include duplicate or conflicting instances as
well as instances with missing values, erroneous values, or outliers.
If any duplicate or conflicting instances, missing values, outliers/erroneous values,
outliers exist, demonstrate how you clean these values.
Conduct the exploratory data analysis for understanding the data, such as (though not
limited to), identify outliers using a histogram or box plot; visualise the distribution of
one categorical attribute using a pie plot or bar plot; explore the relationship between
two features using a scatter plot; explore the relationship among three features by a
scatter plot.
3. Cluster Analysis
Describe the process of data transformation and normalization used in cluster analysis.
Perform cluster analysis of the data set using some clustering methods (such as k-Means
and hierarchical clustering). Implement cluster analysis using Python language. Describe
parameter setting, initialisation, stopping criterion and discuss how you choose the
optimal number of clusters.
Describe the characteristics of each cluster that are generated in cluster analysis.
4. Machine Learning for Classification and their Implementation
Describe the workflow of machine learning for classification with a flow-chart.
State and describe classification methods that are used in your coursework. The
methods may be chosen from those taught in this module, such as k-Nearest Neighbour,
Decision Trees, Logistic Regression. It is also allowed to choose methods that are not
taught in this module. For a Distinction coursework, at least 3 classifiers should be
chosen for classification.
State and describe the regression models that are used for the salary estimation. You
may choose from the methods covered in the module as well as those not covered in
this module.
Describe parameter setting in your classification and regression method(s).
Describe the process of data transformation and normalization for the tasks.
Build and implement machine learning models and tune hyper-parameters in these
models for good performance. You may implement these models using Scikit-Learn
modules or other Python libraries that are not taught in this module.
Implement ensemble learning for classification. Describe the ensemble method(s) that
you are using.
5. Evaluation Machine Learning Models
Evaluate and compare the performance of different machine learning models. You
should at least use one or more of the performance metrics (as appropriate), such as
accuracy, confusion matrix, recall and precision, or Receiver Operating Characteristic
Curve (ROC curve), Error Rate etc.
Explain results using appropriate tables or figures.
Critically review which model performed best and how hyper-parameter tuning change
the performance of the models.
6. Discussions and Conclusions
Summarise your work and findings in this mini project, such as how the selected
features influence the developers’ compensation.
Describe what kind of insight that you have gained from this module.
Explain whether and how well has the module developed your understanding of AI and
Machine Learning.
Finally, it must be pointed out that there exist some online Jupyter notebooks on this data set.
It is allowed for you to study these notebooks, but you must implement your own code in your
coursework and cite these notebooks in your bibliography (if you used any). While you can use
ChatGPT or other Large-Language Models (LLMs) to better understand the module and
assessment, you should be careful not to copy the content as this will be flagged as generated
by ChatGPT and could lead to academic irregularities. Therefore, you should ensure that the
report and implementation are your own work.
You have one chance to check the similarity between your work by submitting your report and
source code to Draft folder on NOW Dropbox. Turnitin similarity score should be somewhere
around 30% for the report and around 60% for the code
Hi Fatjon,
Thank you for considering my proposal. With over 8 years of experience in Report Writing and Research Writing, I am well-equipped to assist you with your project on Machine Learning Analysis of Developer Survey. I have carefully reviewed the project requirements and believe I can provide valuable insights and deliver high-quality results.
I would like to connect with you in chat to discuss the project further and clarify any details. Your project presents an exciting opportunity to apply machine learning techniques to analyze real-world data and derive meaningful conclusions.
Looking forward to discussing this project with you in more detail.
Regards
I am a skilled Python developer and experienced Machine Learning Engineer, capable of delivering solutions tailored to meet your specific requirements. Let me help you achieve your goals with precision and efficiency
Hello, I trust you're doing well.
I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various
artificial intelligence algorithms, including the one you require, using Matlab,
Python, and similar tools. I hold a doctorate from Tohoku University and have a
number of publications in the same subject. My portfolio, which showcases my past
work, is available for your review. Your project piqued my interest, and I would be
delighted to be part of it. Let's connect to discuss in detail. Warm regards.
please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
Hi, I can do it in the NEXT FEW HOURS by working on it right now. I would like to help you at a LOW PRICE, HIGH-QUALITY work delivered before TIME. Message me for detailed discussion. Thanks and Looking forward:).
I guarantee 100% satisfaction of my clients and provide Turnitin plagiarism reports. Please see my profile and reviews from my previous clients.
https://www.freelancer.com/u/DevBenchHQ
Hi there,I'm biddin on your project "Machine Learning Analysis of Developer Survey"
I have read your project description and i'm an expert in Machine learning/Python/C++/Java and Data science therefore i can do this project for you perfectly.I still have a few questions. please leave a message on my chat so we can discuss the budget and deadline of the project.
Thanks.
..
.
Hello
I’m confident I can assist with your project, delivering high-quality, unique content crafted without plagiarism or AI tools. With 7 years of experience in professional writing, I excel at managing complex tasks under tight deadlines and consistently produce detailed, high-caliber work. I can deliver up to 10 pages daily, ensuring promptness and thorough attention to detail.
Feel free to award the project and reach out to me here for seamless collaboration: https://freelancer.com/u/expertshut
Let's create something exceptional together!
Best regards,
Saira
"Machine Learning Analysis of Developer Survey "
I am an award-winning content and copywriter passionate about creating engaging and compelling blogs, articles, business and sales plans, marketing plans, feasibility studies, technical writing, creative writing, speech writing, online writing, etc. I've been privileged to work with people and businesses of all backgrounds. I aim to provide impactful, reader-friendly, and compelling content for your needs. With more than 4 years of experience in this field, I provided many top-notch solutions to clients according to their needs and requirements. it is my responsibility to provide you with plagiarism-free, error-free solutions which help in achieving good grades and a reputation in your batch. A strong vocabulary, attention to detail, clarity, and openness to changes/suggestions make me a good fit for this project.
Here is my portfolio you can check my achievements: https://www.freelancer.com/u/QualityHub
If your project is deleted, hit me with this link: https://www.freelancer.com/u/QualityHub I await your text.
Hi,
I can help you to complete your desired project, delivering high-quality results at the lowest guaranteed budget without plagiarism or AI tools. I am a Doctor of Education with 5 years of experience as a professional writer. I can handle complex tasks under tight deadlines, delivering up to 10 pages daily, ensuring promptness and thorough attention to detail.
Feel free to award the project and contact me here for seamless collaboration: https://www.freelancer.com/u/WinsomeWriting
Best regards,
Faaraz
Hello,
I am Muhammad Asad, a PCAP certified Python developer available immediately to assist with your Machine Learning Analysis of the Developer Survey project. With extensive experience in data analysis and machine learning, I am well-equipped to implement exploratory data analysis, cluster analysis, classification, and regression modeling as outlined in your coursework requirements.
I have successfully completed similar projects, notably a sentiment analysis of Twitter data and a classification model predicting housing prices. These projects enhanced my ability to manipulate complex datasets and apply machine learning algorithms effectively.
I am eager to help you derive meaningful insights from the Stack Overflow 2024 Annual Developer Survey dataset, ensuring the report is comprehensive and adheres to the specified guidelines.
What specific insights or outcomes are you hoping to achieve from this machine learning analysis of the Developer Survey data?
Thanks,
Muhammad Asad
Hi dear,
I am an experienced data analyst and machine learning professional, skilled in Python and advanced analytical techniques. I can implement exploratory data analysis, clustering, classification, and regression tasks using the Stack Overflow 2024 Developer Survey dataset. With expertise in data preprocessing, visualization, and building optimized machine learning models, I will ensure comprehensive insights and minimal error rates. I adhere strictly to academic guidelines, delivering original work free from AI tools like ChatGPT. My report will be well-structured, aligned with CRISP-DM methodology, and formatted as per the requirements. Let’s collaborate to achieve outstanding result.
Hi there, I am a data scientist and a professional responsible for extracting actionable insights and knowledge from large volumes of data. As an experienced Data Scientist in machine learning, I am highly proficient in Python and deeply understand algorithms and data structures. My skills make me an excellent fit for your project, as I can guide you through comprehensive coverage of data structures and algorithms while providing patient and thorough explanations. I have over 12 years of experience with Python Library Pandas, Karas, TensorFlow, NumPy, PyCharm, Py torch, Open CV, NLP, and others.
With over a decade of experience under my belt, including expertise in NLP, Natural Language Processing, Neural Networks, CNNs, RNNs, LSTM, and GANs, to mention a few, I can provide you not only with knowledge but also with how to apply it efficiently. Partnering with me ensures you have a patient, knowledgeable, and skilled tutor dedicated to your success in this field.
My top priority is to provide high-quality work.
https://www.freelancer.com/u/GdevDataScience
Let's discuss this further via chat, and I'll start your project right now.
Thanks
Gdev
Hi Good afternoon
This is Umair
You can see clearly from my profile that all my reviews/feedbacks are 5 stars and that's for a sole reason that I only take those projects which are doable for me.
I am very much familiar with Machine Learning (ML), Research Writing, Artificial Intelligence, Algorithm and Report Writing. I have done similar projects before. Let's have a quick chat on this project to clear further details and I will give you development feedback as soon as possible. I am a Full time developer and can work on Machine Learning (ML), Research Writing, Artificial Intelligence, Algorithm and Report Writing. Looking forward to working with you.
Thanks
Umair Anwar.
Hello, my name is Asif, and I specialize in creating high-quality content for various projects. With strong expertise, attention to detail, and a focus on clear communication, I can help showcase the value of your products, services, or ideas. Motivated and reliable, I have a proven track record of meeting deadlines and exceeding expectations. Let’s collaborate to bring your vision to life—share your needs, and we can get started!
To ensure I deliver the best solution for your project, I’d love to know:
- What specific insights or trends are you hoping to uncover from the developer survey data?
- Are there any particular machine learning algorithms or tools you would like to use for the analysis?
- Do you have any preferences for how the results should be presented (e.g., visualizations, reports)?
Hello there,
I have thoroughly reviewed the project requirements for the Machine Learning Analysis of the Developer Survey. My proposed project plan includes implementing exploratory data analysis, cluster analysis, classification, and regression models based on the "Stack Overflow 2024 Annual Developer Survey" data. I will ensure to follow the CRISP-DM methodology and provide a detailed report within the specified page limits.
I invite you to review my portfolio for a better understanding of my expertise and past projects:
https://www.freelancer.ca/u/DGM999
Please feel free to initiate a chat to discuss further details and kickstart this project.
Sincerely,
Sadat Saeed
⭐⭐⭐⭐⭐ Machine Learning for Developer Survey Insights
❇️ Hi There, I hope you're well. I've reviewed your project details and it's clear you need help with machine learning analysis on the "Stack Overflow 2024 Annual Developer Survey" data. Zohaib is here to assist! With extensive experience in machine learning, I can help you derive valuable insights from this survey data, meeting all your project requirements.
➡️ Why Me?
With 5 years of experience in machine learning and a strong grasp on data analysis, cluster analysis, classification, and regression models, I am well-equipped to handle this project efficiently. My expertise also extends to ensemble learning and performance evaluation for machine learning models.
➡️ Let's have a quick chat to dive deeper into your project needs. I can share samples of my previous work to showcase my ability to deliver quality results. Looking forward to discussing this with you.
➡️ Skills & Experience:
✅ Data Analysis
✅ Cluster Analysis
✅ Classification Models
✅ Regression Models
✅ Ensemble Learning
✅ Performance Evaluation
✅ Python Programming
✅ Data Preprocessing
✅ Machine Learning Algorithms
✅ Model Optimization
✅ Report Writing
✅ CRISP-DM Methodology
Awaiting your response!
Best Regards,
Zohaib
Hi there,
How r u? I have had a look at ur project i can handle it well as i have experience in Research Writing, Machine Learning (ML), Report Writing, Artificial Intelligence and Algorithm. I have worked on similar projects before too. Please initiate the chat and discuss in detail. Waiting for ur kind response.
Regards
Hello T0nimemaj,
I am Rajat, a highly skilled developer with expertise in Python, Java, C++, and Artificial Intelligence. Your project on Machine Learning Analysis of the Developer Survey aligns perfectly with my skill set and experience. I have a strong background in data analysis, machine learning, and report writing, making me well-equipped to handle the tasks outlined in your project description.
With my proficiency in Python libraries like Scikit-Learn and TensorFlow, I can implement the necessary machine learning tasks and deliver accurate results. Additionally, my experience in software development and AI will ensure high-quality deliverables within the specified timeframe.
I am eager to discuss how I can contribute to the success of your project. Let's connect and explore how we can
Hi dear client, I am a Data Scientist and Computers and Systems Engineer (CSE) has great knowledge an enthusiasm in Python program Development. I can write clean, validated python code and make a device-supported .py File. I have over 10-years of experience with python library Pytorch, Kara’s, Numpy, Pycharm, Tensorflow, Pandas, Open CV,NLP, and others. My top priority is to provide a high quality of work, I am willing to fully devote my time and energy to improve the service offered, with timely, accurate and professional results, building trust and a long term relationship with customer is my main objective.
https://www.freelancer.com/u/IcsfIT789
Let's discuss this further via chat, and I'll start your project right now.
Thanks
IcsfIT
Greetings!
How are you today?
Thanks for posting this job. I have checked your project description.
✅I'm AI engineer also Developer with 6+ years experiment.✅
I am very familiar to Deep learning such as Tensorflow and Keras, Yolo,...
I have a good hands on working with Python, AI, Big Data.
I have quite a good knowledge of DL/ML Algorithm My area of expertise is building Image Processing, Classification/Prediction/Clustering, NLP, Mask-RCNN, Object recognize, Object detection. I usual using many technique and library, frameworks such as tensorflow, tesseract, machine learning such as CNN, DNN, FCRN, SVN etc.
Contact me with all the details and requirements for your project for further discussion. I will provide you dedicated support and follow-up.
Best wishes,
Hello there,
I understand you're working on a machine learning coursework involving the "Stack Overflow 2024 Annual Developer Survey" dataset. I can help guide you through the project, from data exploration and preprocessing to implementing and evaluating machine learning models for classification and regression tasks. We'll work together to understand the data, select relevant features, and build models to predict developer salaries and identify distinct developer groups.
I have experience working with data analysis and machine learning projects, including data exploration, feature engineering, and model building. I can help you understand the concepts and guide you through the implementation process.
Please note that this is a collaborative effort, and I will be assisting you in understanding the concepts and guiding you through the process while you independently complete the coursework.
Let's chat so we can discuss your current understanding of the project and how I can best support your learning journey.
Kind Regards,
Arbaz A :)