Are you really who you say you are?

What does your online dating profile reveals about you?

Nadir Sarigul
CodeX

--

Online dating has disrupted more traditional ways of meeting romantic partners, becoming one of the central pillars of how we interact with one another in the hope to find love and a romantic partner. Online dating has been around for many years now, but while in 2013 only 11% of U.S. adults had reported having used a dating site or app, the proportion of users have steadily increased over time with 30% of U.S. adults reporting that they have used/use a dating site or app in 2019 (Pew Research Center).

But why does online dating become so popular? For once the way we as a society live our lives has changed dramatically as technology has advanced and become an integral part of our lives. The world is now at the tip of one’s fingers! On the other hand, we have also become busier with tight schedules and much pressure in our professional lives. So online dating presents itself as an easy and effective way to meet people with the same interests and beliefs. On the other hand, online dating also significantly broadens the pool of potential romantic partners well beyond one’s social circle.

But how efficient is online dating to produce meaningful relationships? According to the Pew Research Center, out of the 30% of U.S. adults that have reported using online dating sites or apps, only 12% reported having married or been in a committed relationship with someone they met through online dating. In fact, 50% of U.S. adults think that online dating has not really affected their dating and/or relationships, while 22% think that online dating has had a positive effect. However, 26% think that online dating has mostly a negative impact on dating. This is largely attributed to dishonesty and misrepresentations in the user’s profiles. That comes from the fact that is far easier to lie online than offline, particularly about one’s physical appearance or job. Online dating lies often are subtle, they represent a person’s attempt to portray themselves in the best light possible with only slight exaggerations. However, some users use deception to higher extents completely inventing a new persona. This concept is known as “catfishing” has become more and more prevalent, which is highlighted by their presence in new reality shows such as Netflix’s The Circle.

After many years on online dating platforms I started thinking: is there a way one’s profile can reveal one’s true identity? Using the OkCupid profile dataset from the Date-A-Scientist Project (Code Academy) which contains information from 59,946 users I have use machine learning tools to build models to predict the gender of OkCupid users. This dataset was scraped from active profiles in 2012 and contains several layers of information on gender, sexual orientation, ethnicity, physical features, drinking, and drug habits as well as information on income, religion, and education level.

Because developing machine learning models rely on multiple attempts to increase the success rate of classification tasks, evaluating the baseline success rate is an important step to take. We can do this using the Dummy Classifier, which sets the baseline performance (i.e. the success rate that one should expect to achieve by simply guessing).

Results of Dummy Classifier Model

In this case, the Dummy Classifier always predicts the user to be a male, which we know by looking at the gender that it is definitely not the case! This dataset, albeit having more males than females (on a ratio of about two females per three males), definitely has many females on it.

Let’s see if we can improve our predictions by using more sophisticated algorithms. Logistic regression a machine learning algorithm with its basis on the logistic function developed by statisticians to describe properties of population growth. All things considered logistic regression is a simple and very efficient way to generate predictive models for binary classification. So, I decided to start by seeing if a logistic regression model would provide a good prediction on the gender of OkCupid users.

Results of the Logistic Regression Classifier

Looking at the results, it seems that the logistic regression model is a fairly good and accurate model to predict the gender of OkCupid users as it can predict correctly 84% of the females and 91% of the male users. Nevertheless, let’s see if we can still improve our predictions further using different classifiers.

The Decision Tree algorithm sets sequential and hierarchical decisions. The goal of using a Decision Tree is to create a training model that can be used to predict the class or value of the target variable by learning simple decision rules inferred from prior data.

Results of the Decision Tree Classifier

It looks like our decision tree model is not better than the logistic regression model. Perhaps a single decision tree is not enough to clearly define gender based on the vary many features that our data set includes. Maybe we need multiple decision trees incorporated in our model for it to perform as well as the logistic regression or better. We can do this using the Random Forest algorithm, which is a tree-based algorithm that leverages the power of multiple decision trees for making decisions.

Results of the Random Forest Classifier

Looking at our random forest model we can see that it is an improvement from the single decision tree. However, it looks like the logistic model is a more robust model than either a single decision tree or random forest, as it is the model with the best accuracy while also having a high degree of recalling the correct gender.

An interesting property of these predictive models is that it also tells us what features did the models gave more importance for their predictions, which then allows us to get an insight into what to look for when looking at an online dating profile.

Feature Importances Calculated from Random Forest Classifier

Height is by far the most important feature considered by these predictive models. Looking at the data we can see that there is definitely a clear cut between the height of males and the height of females in this data set. It is also something we know from simply observing people around, men are generally taller than women are.

Another feature in online dating profiles that seem to be important in the distinction of males and females is body type, with women generally identifying their body type as curvier and fuller than men.

Sexual orientation, particularly identification as bisexual, also has some weight in these model's decisions and we can see that more women report being bisexual in their OkCupid profiles than men do.

Professionally there are also some tendencies that we can take into consideration. For example, computer-related jobs are overwhelmingly more predominant for males while medicine and education-related jobs are more predominant for women.

So, if you see yourself in an online dating site or app and you have a suspicion about the real gender of your date, take a good look at these specific features to see if they can give you a good clue of who you are really talking too. Good luck out there :)

Check out my GitHub for the Code!

--

--