一个使用深度学习对恶意软件进行识别分类的比赛项目
使用了DeepFM;简易NuralNetwork;Random-forest和一个最终的整合模型
已将原为.ipynb的文件转换为.py文件,路径合法的情况下完全可以使用
A project that uses deep learning to identify and classify malware. (Has been used to participate in an Chinese university competition event. )
The project has used DeepFM/ Simple NuralNetwork/ Random-forest and an integration model that contains these three methods.
The original .ipynb files has been converted to .py files, it can be used if the path is legal.
The training set is not included in the project. Please collect by yourself if needed.
使用Wireshark对训练样本中的.pcap包进行分析,在分类中包含了多种类的应用类型,所利用的网络协议包括占比较大比例的TCP\HTTP\ARP\UDP\SSH,属于网络环境中被恶意攻击相对频繁和显著的协议类型。构建于TCP/IP协议之上的HTTP协议,在公共网络环境中占比较大,针对HTTP协议和各类超文本语言(HTML\PHP)的网络攻击量随之提升。
所提供的数据包包含了应用和网络攻击两部分,其中“网络攻击”具备了分属于XSS跨站脚本攻击、SQL注入、Stack Buffer Overflow堆栈缓冲区溢出三类攻击方式,实施时间和目标站点不同,同类攻击中传递参数和指令也有部分差异。“应用”包含了各大网络应用、站点和服务的连接请求和数据交换包,其中“应用”包含的协议类型作为被“网络攻击”所威胁的目标,在样本分析中作为用户行为和网络流量交互的正常标准。
Use Wireshark to analyze your training samples. The classification aims a variety of application types. The network protocols simulation include a relatively large proportion of TCP\HTTP\ARP\UDP\SSH, which belongs to A protocol type that is relatively frequently and prominently attacked by malicious in the network environment. The HTTP protocol, which is built on top of the TCP/IP protocol, occupies a relatively large proportion in the public network environment, and the amount of network attacks against the HTTP protocol and various hypertext languages (HTML\PHP) has increased accordingly.
The provided data package contains two parts: Applications and Attack. The "Attack" has three types of attacks: XSS cross-site scripting attack, SQL injection, and Stack Buffer Overflow. The method, which implementation time and target site are different, and there are also some differences in the parameters and instructions passed in similar attacks. "Applications" include connection requests and data exchange packets of major network applications, sites and services. The protocol types included in "Applications" serve as targets threatened by "Attack" and are used as user behavior and network traffic in sample analysis. The normal standard of interaction.
本项目中使用了CICFlowmeter (https://github.com/ahlashkari/CICFlowMeter)
作为数据流分析工具,通过批量处理脚本对大量样本进行了统一转换,以求效率。
We used CICFlowmeter as basic dataflow analyse tool, by create script file to achieve batch processing. Please visit the URL which has been noted.
模型/方法
DeepFm.py
Ensemble.py
NuralNetwork.py
Randomforest.py
数据示例
data.zip 已分类的协议/攻击类型示例