Abstract:
The objective of the project is to design a solution that allows the financial sector to achieve
greater profitability, developing a decision support system for loan management
based on Machine Learning techniques. This system, whose mission is to detect
patterns to help investors and lenders identify the most profitable operations, is supported
by a platform built in the cloud based on the services offered by Amazon Web Services.
To do this, use is made of a database published by the National Federal Mortgage
Association, more commonly known as FannieMae, a large company that offers most of the
mortgages in the United States, and that contains information on acquisitions and behavior
of mortgage loans in the United States, updated on a quarterly basis.
The platform consists of a set of instances that act as pieces within a Apache Spark
cluster, and a Hadoop distributed storage system. In addition, the objective of this plat-
form is to be easily scalable, and for this the solutions offered by Docker are applied, thus
allowing the platform to be adapted to computing needs.
After an extensive analysis, where the programming language PySpark is used, a so-
lution model is developed applying supervised classification algorithms, thus achieving a
system that allows identifying from the characteristics of the loan and the borrower those
that provide greater profitability.