![]() |
Create fake data in python |
Be it web application development or REST API development, be it data science or machine learning, Data is involved in all the fields of computer science. And as Python developers, we may need some sample fake data in our hands to either test any application's data flow or maybe just for learning how machine learning models work.
In both cases, it would be nice to have a tool within our arsenal, capable of generating sample fake data in the blink of an eye. And that's exactly what python's faker library does.
What is the faker library?
According to the library's official documentation, "Faker is a python package
capable of generating random fake data.". This library is inspired by its counterparts in other programming
languages like PHP Faker, Perl Faker, and Ruby
Faker.
We'll understand more about its functionality and capabilities as we progress
through the course.
Getting Started
Installation
To get started, make sure you have Python 3 set up properly within your
system along with pip. If you are not sure how to get started, have a look at this video.
We will start by installing faker package. Open the command prompt through
windows (or shell in MacOS or Linux) and type
For Windows:
pip install faker
For MacOS/Linux:
pip3 install faker
Next, wait for the installation to complete. Once it's done, we can move on to importing the module onto our python script.
Within the command prompt (or shell), type
python
to open the python shell. Next, we will import the Faker class from the faker
module by typing
from faker import Faker
Once the import is done, we will create an object of the Faker class to
generate data, let's name the object fake_obj for now. We will write
# create Faker class object fake_obj = Faker()
Now we can use the fake_obj object to generate fake data.
Generating Fake Data
The faker package has multiple different functions
created to generate a variety of data. This, included with the functions contributed by the community are a lot to
cover in a single blog, but we will quickly go through few of them one by one, just to get an idea how it
work.
Generating person's identity details
Using "faker.providers.person" provider, we can generate a variety of a
person's personal identity data like firstname, lastname, prefix, suffix etc. For ex:
# get first name fake_obj.first_name() # get last name fake_obj.last_name() # get name prefix fake_obj.prefix() # get name suffix fake_obj.suffix()
Generating date and time details
Using "faker.providers.date_time" provider, we can generate random date and
time details in various formats. For ex:
To get a random date between any two dates
# import datetime from datetime import datetime, timedelta # current date minus 365 days start_date = datetime.now() - timedelta(days=365) # current date end_date = datetime.now() # get random date between start_date and end_date # if you call this function without passing start_date and end_date # then a random date is generated between 30 years old date and current date fake_obj.date_between(start_date, end_date)
To get a random time between any two times
# works similar to date_between, but also generates a # random time fake_obj.date_time_between(start_date, end_date)
Generating user profile details
Using "faker.providers.profile", we can generate random user profile data
including details like job, company, ssn, residence etc.
# generate fake profile fake_obj.profile()
The generated profile data looks like this:
{ "job": "Musician", "company": "Williams-Sheppard", "ssn": "498-52-4970", "residence": "Unit 5938 Box 2421\nDPO AP 33335", "current_location": (Decimal("52.958961"), Decimal("143.143712")), "blood_group": "B+", "website": [ "http://www.rivera.com/", "http://grimes-green.net/", "http://www.larsen.com/" ], "username": "leeashley", "name": "Gary Cross", "sex": "M", "address": "711 Golden Overpass\nWest Andreaville, OH 44115", "mail": "tamaramorrison@hotmail.com", "birthdate": datetime.date(1946,4,11) }
Generating paragraph and sentence data
Using "faker.providers.lorem", we can generate fake paragraphs and
sentences.
# to generate random paragraph fake_obj.paragraph() # to generate random sentence fake_obj.sentence()
Conclusion
There are multiple standard and community providers that can generate a
variety of random data like airport names, credit scores, etc.
To get a list of all community providers, view this page.
If you have any queries, please let me know