Data Extraction, Storage and Processing
- SQL, Postgres, MySQL, SQLite
- ElasticSearch
- MongoDB
- Python
- Spark (PySpark API)
I am a self-disciplined, resilient, and proactive person.
I have a military academy background (Ex-Brazilian Navy officer) and a very curious and active mind.
Some time ago, I fell in love with data science and, since then, I've been focusing my energy and time on projects to solve business challenges using data science concepts and tools.
Currently, I work as a data scientist at Nubank, as an AI mentor at the "MIT Applied Data Science Program: Leveraging AI for Effective Decision-Making" and as an AI mentor at Social Good Brasil. I am also an AWS certified machine learning specialist and AWS certified cloud practitioner.
Besides, I am committed to developing my skills in areas closely related to applied data science such as product management, entrepreneurship, business and startups.
November 2022
After reading many books, attending many courses and doing a bunch of data science projects, I felt the need to define how I should move from real-world problems to real-world solutions in a structured way.
So, the purpose of this brief material is to share my initial summary of how to structure a problem-solving strategy. I emphasize that it is just my initial MVP about this subject. In other words, it is not supposed to be a definitive solution, not even to replace any already tested framework!
I'm sharing this compilation so anyone interested in this topic can learn or remember something relevant to solve some real problem: if this happens somehow, I would be delighted!
July 2022
The idea is to create synthetic data regarding customer behaviour for two groups of customers: control and treatment. We would generate this behaviour with statistical distributions (e.g. Poisson and Gamma distributions) and would ingest both the created customer behaviour and the statistical distribution params in the data engineering architecture. The data would flow throughout the architecture, e.g. data ingestion layer, a bronze layer, a silver layer, etc. As the output, we would have the data regarding the customer behaviour and its statistical distribution blueprint.
Then, we could use A/B testing tools to check if there is a statistically significant difference between the control and the treatment groups. However, once we know the original distribution of both groups, we know if they are different or not, so we will be able to check if the A/B tests would give us the correct result of not (especially regarding type I and type II errors).
March 2022
We all live in a society that produces an overwhelming amount of information daily. Information per se is valuable but it's often very challenging to spotlight the essential part of it - the bottomline, so to say. This mental-filtering process can be very time consuming and also confusing sometimes.
With our technical solution, we provide an automated service that identifies the text's most relevant sentences so as to summarize the text. Additionally, the service provides the general sentiment (positive, neutral or negative) of the text. In other words, the final product will give the user a general idea about the text content as well as its most prominent sentiment.
December 2021
Blocker Fraud Company is a company specialized in the detection of fraud in financial transactions made through mobile devices.
The company is expanding in Brazil and, to find new customers more quickly, it has adopted a very aggressive strategy.
The strategy works as follows:
The final solution includes a Power BI reporting dashboard with answers to business questions as well as a Docker container with API implementation, made with FasAPI and PySpark, and a MongoDB database with APIs requests saved for future analyses. The estimated profit using this solution is BRL 230,133,584.05.
November 2021
The All in One Place company is a multi-brand outlet company that sells second-line products of several brands at a lower price through e-commerce.
Within just one year of operation, the marketing team realized that some customers buy more expensive products with high frequency and contribute to a significant portion of the company's revenue.
This project aims to determine who are the customers eligible to participate in the Insiders program. Once this list is ready, the Marketing team will carry out a sequence of personalized and exclusive actions to this group of people to increase their sales and purchase frequency.
The final solution answers business questions, validates business hypotheses, creates a Metabase reporting dashboard and implements a solution architecture in the AWS cloud.
October 2021
Rossmann is a company that operates over 3,000 drug stores in 7 European countries. Its products range includes up to 21,700 items and can vary depending on the size of the shop and the location.
Rossmann store managers need daily sales predictions for up to six weeks in advance so as to plan infrastructure investments in their stores (will the next six weeks' sales be high enough to balance infrastructure investment?).
The final solution for this problem is a Telegram bot where the user just needs to type the number of the store and the bot will quickly answer the sales prediction for this given store in the next six weeks.
Besides, if the final user wants more detailed information about this six weeks prediction, he (she) could get further details on a Streamlit data App, with an interactive chart, on sales prediction over these six weeks.
Furthermore, on this data App, the user can also read the entire project overview to understand further how this prediction is made.
September 2021
Insurance All is a health insurance company and its products team is analyzing the possibility of offering a new product, automobile insurance, for its health insurance clients.
Similar to its health insurance, customers of this new insurance plan would have to pay an annual plan to be insured by Insurance All in case of an eventual car accident or damage.
In this project, I developed a Machine Learning algorithm that increases the number of contacted interested customers by 1,316 and 2,259 for 20,000 and 40,000 sales teams contacts so that the estimated revenue increases are respectively U$ 131,600 and U$ 225,900.
Feel free to contact me in case of questions about my projects, data science opportunities and any other reason you think is relevant ;)