`Creating Environment for the project`

Creating conda environment for the project.

conda create -n book python=3.7 -y
Activating the conda environment.

conda activate book
Installing required libraries

pip install pandas

pip install numpy

pip install scipy

pip install scikit-learn
Dataset :- https://www.kaggle.com/ra4u12/bookrecommendation

`Preprocessing the book dataset`

import pandas as pd
import numpy as np
import pickle
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

books = pd.read_csv('data/books.csv', sep=";", error_bad_lines=False, encoding='latin-1')

books.sample(2)

books.iloc[237]['Image-URL-L']

'http://images.amazon.com/images/P/0671027387.01.LZZZZZZZ.jpg'

books.shape

(271360, 8)

books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

Here we are selecting the most important columns and also selecting Image URL columns for the poster.

books = books[['ISBN','Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher','Image-URL-L']]

books.sample(5)

books.rename(columns={"Book-Title":'title',
                      'Book-Author':'author',
                     "Year-Of-Publication":'year',
                     "Publisher":"publisher",
                     "Image-URL-L":"image_url"},inplace=True)

books.sample(5)

`Preprocessing user csv file`

users = pd.read_csv('./data/user.csv', sep=";", error_bad_lines=False, encoding='latin-1')

users.sample(5)

users.shape

(278858, 3)

users.rename(columns={"User-ID":'user_id',
                      'Location':'location',
                     "Age":'age'},inplace=True)

users.sample(5)

`Preprocessing bookrating csv file`

ratings = pd.read_csv('./data/bookrating.csv', sep=";", error_bad_lines=False, encoding='latin-1')

ratings.sample(5)

ratings.shape

(1149780, 3)

ratings.rename(columns={"User-ID":'user_id',
                      'Book-Rating':'rating'},inplace=True)

ratings.sample(5)

Now we have 3 dataframes
- books
- users
- ratings

print(books.shape, users.shape, ratings.shape, sep='\n')

(271360, 6)
(278858, 3)
(1149780, 3)

ratings['user_id'].value_counts()

11676     13602
198711     7550
153662     6109
98391      5891
35859      5850
          ...  
116180        1
116166        1
116154        1
116137        1
276723        1
Name: user_id, Length: 105283, dtype: int64

ratings['user_id'].value_counts().shape

(105283,)

ratings['user_id'].unique().shape

(105283,)

x = ratings['user_id'].value_counts() > 200

x[x].shape

(899,)

y= x[x].index

y

Int64Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352,
            110973, 235105,
            ...
            260183,  73681,  44296, 155916,   9856, 274808,  28634,  59727,
            268622, 188951],
           dtype='int64', length=899)

ratings = ratings[ratings['user_id'].isin(y)]

ratings.head()

ratings.shape

(526356, 3)

ratings_with_books = ratings.merge(books, on='ISBN')

ratings_with_books.head()

ratings_with_books.shape

(487671, 8)

number_rating = ratings_with_books.groupby('title')['rating'].count().reset_index()

number_rating.sample(5)

number_rating.rename(columns={'rating':'num_of_rating'},inplace=True)

number_rating.head()

final_rating = ratings_with_books.merge(number_rating, on='title')

final_rating.head()

final_rating.shape

(487671, 9)

final_rating = final_rating[final_rating['num_of_rating'] >= 50]

final_rating.head()

final_rating.shape

(61853, 9)

final_rating.drop_duplicates(['user_id','title'],inplace=True)

final_rating.shape

(59850, 9)

book_pivot = final_rating.pivot_table(columns='user_id', index='title', values= 'rating')

book_pivot

book_pivot.shape

(742, 888)

book_pivot.fillna(0, inplace=True)

book_pivot

Now Model Training Begins

We are using scipy csr_matrix for fast row slicing, faster matrix vector products.

book_sparse = csr_matrix(book_pivot)

Now import our clustering algoritm which is Nearest Neighbors this is an unsupervised ml algorithm and used in our model building part.

model = NearestNeighbors(algorithm= 'brute')

model.fit(book_sparse)

NearestNeighbors(algorithm='brute')

We are calculating distance and suggestion using k nearest neighbours.

distance, suggestion = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1,-1), n_neighbors=6 )

pickle.dump(model, open('./model/model.pkl','wb'))

distance

array([[ 0.        , 68.78953409, 69.5413546 , 72.64296249, 76.83098333,
        77.28518616]])

suggestion

array([[237, 240, 238, 241, 184, 536]], dtype=int64)

book_pivot.iloc[241,:]

user_id
254       9.0
2276      0.0
2766      0.0
2977      0.0
3363      0.0
         ... 
275970    9.0
277427    0.0
277478    0.0
277639    0.0
278418    0.0
Name: Harry Potter and the Sorcerer's Stone (Book 1), Length: 888, dtype: float64

Now printing a suggested book by K-Nearest Neighbour.

for i in range(len(suggestion)):
    print(book_pivot.index[suggestion[i]])

Index(['Harry Potter and the Chamber of Secrets (Book 2)',
       'Harry Potter and the Prisoner of Azkaban (Book 3)',
       'Harry Potter and the Goblet of Fire (Book 4)',
       'Harry Potter and the Sorcerer's Stone (Book 1)', 'Exclusive',
       'The Cradle Will Fall'],
      dtype='object', name='title')

book_pivot.index[5]

'A Bend in the Road'

book_names = book_pivot.index

book_names[2]

'2nd Chance'

np.where(book_pivot.index == 'A Bend in the Road')[0][0]

5

Finding url

ids = np.where(final_rating['title'] == "Harry Potter and the Chamber of Secrets (Book 2)")[0][0]

final_rating.iloc[ids]['image_url']

'http://images.amazon.com/images/P/0439064872.01.LZZZZZZZ.jpg'

Storing book name by the id given by suggestion in knearest neaghbours.

book_name = []
for book_id in suggestion:
    book_name.append(book_pivot.index[book_id])

book_name[0]

Index(['Harry Potter and the Chamber of Secrets (Book 2)',
       'Harry Potter and the Prisoner of Azkaban (Book 3)',
       'Harry Potter and the Goblet of Fire (Book 4)',
       'Harry Potter and the Sorcerer's Stone (Book 1)', 'Exclusive',
       'The Cradle Will Fall'],
      dtype='object', name='title')

Extracting index of id from a specific book name.

ids_index = []
for name in book_name[0]: 
    ids = np.where(final_rating['title'] == name)[0][0]
    ids_index.append(ids)

Now extracting book given by ids_index

for idx in ids_index:
    url = final_rating.iloc[idx]['image_url']
    print(url)

http://images.amazon.com/images/P/0439064872.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0439136369.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/043936213X.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0446604232.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0440115450.01.LZZZZZZZ.jpg

Now storing model and other related file in pickle format for prediction purpose.

pickle.dump(model,open('artifacts/model.pkl','wb'))
pickle.dump(book_names,open('artifacts/book_names.pkl','wb'))
pickle.dump(final_rating,open('artifacts/final_rating.pkl','wb'))
pickle.dump(book_pivot,open('artifacts/book_pivot.pkl','wb'))

Testing model

def recommend_book(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distance, suggestion = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors=6 )
    
    for i in range(len(suggestion)):
            books = book_pivot.index[suggestion[i]]
            for j in books:
                if j == book_name:
                    print(f"You searched '{book_name}'\n")
                    print("The suggestion books are: \n")
                else:
                    print(j)

book_name = "Harry Potter and the Chamber of Secrets (Book 2)"
recommend_book(book_name)

You searched 'Harry Potter and the Chamber of Secrets (Book 2)'

The suggestion books are: 

Harry Potter and the Prisoner of Azkaban (Book 3)
Harry Potter and the Goblet of Fire (Book 4)
Harry Potter and the Sorcerer's Stone (Book 1)
Exclusive
The Cradle Will Fall

	ISBN	Book-Title	Book-Author	Year-Of-Publication	Publisher	Image-URL-S	Image-URL-M	Image-URL-L
141793	186508400X	Follow the Blue	Brigid Lowry	2001	Allen & Unwin (Australia) Pty Ltd	http://images.amazon.com/images/P/186508400X.0...	http://images.amazon.com/images/P/186508400X.0...	http://images.amazon.com/images/P/186508400X.0...
238570	078712043X	Cat Who Tailed a Thief (Cat Who... (Audio))	Lillian Jackson Braun	1999	Audio Literature	http://images.amazon.com/images/P/078712043X.0...	http://images.amazon.com/images/P/078712043X.0...	http://images.amazon.com/images/P/078712043X.0...

	ISBN	Book-Title	Book-Author	Year-Of-Publication	Publisher	Image-URL-L
10488	0515085995	The Mick	Mickey Mantle	1986	Jove Books	http://images.amazon.com/images/P/0515085995.0...
149036	3453189876	So wahr mir Gott helfe.	John T. Lescroart	2001	Heyne	http://images.amazon.com/images/P/3453189876.0...
66324	0441005829	The Lady in the Loch	Elizabeth Ann Scarborough	1998	Ace Books	http://images.amazon.com/images/P/0441005829.0...
225330	0395171709	Field Guide to Shells in Atlantic Gulf Coast	Percy A. Morris	1973	Houghton Mifflin Company	http://images.amazon.com/images/P/0395171709.0...
13227	1551669234	Stonebrook Cottage	Carla Neggers	2002	Mira	http://images.amazon.com/images/P/1551669234.0...

	ISBN	title	author	year	publisher	image_url
99794	0590404261	Teacher's Pet (Couples, No 21)	M.E. Cooper	1987	Scholastic Paperbacks (Mm)	http://images.amazon.com/images/P/0590404261.0...
246698	0590250825	Revenge (Nightmare Hall, No 26)	Diane Hoh	1995	Scholastic Paperbacks	http://images.amazon.com/images/P/0590250825.0...
82550	840309289X	Cuando Comer es un Infierno: Confesiones de un...	Espido Freire	2002	Aguilar	http://images.amazon.com/images/P/840309289X.0...
247672	0373115008	Tiger Moon (Harlequin Presents, 1500)	Kristy McCallum	1992	Harlequin	http://images.amazon.com/images/P/0373115008.0...
182015	0671509500	Chubby Tugboat	N. Bradley Omar	1984	Simon & Schuster Merchandise &	http://images.amazon.com/images/P/0671509500.0...

	User-ID	Location	Age
48171	48172	rockville, maryland, usa	NaN
84979	84980	somerville, new jersey, usa	NaN
214199	214200	magnolia, texas, usa	NaN
76944	76945	proserpine, queensland, australia	64.0
125599	125600	ferrara, emilia romagna, italy	NaN

	user_id	location	age
236446	236447	victoria, british columbia, canada	NaN
59431	59432	london, england, united kingdom	26.0
73686	73687	san diego, california, vietnam	NaN
188154	188155	la mesa, california, usa	19.0
256098	256099	laguna beach, california, usa	NaN

	User-ID	ISBN	Book-Rating
437318	104636	0515130044	0
635481	153662	0517122057	0
1114788	267354	0671795538	8
43959	11601	0425075826	0
747930	180945	0786867876	0

	user_id	ISBN	rating
134434	30744	0373835116	0
877603	212835	0451201515	0
1085640	259711	0698119320	0
207867	48025	0743444507	9
346741	82893	006096491X	0

	user_id	ISBN	rating
1456	277427	002542730X	10
1457	277427	0026217457	0
1458	277427	003008685X	8
1459	277427	0030615321	0
1460	277427	0060002050	0

	user_id	ISBN	rating	title	author	year	publisher	image_url
0	277427	002542730X	10	Politically Correct Bedtime Stories: Modern Ta...	James Finn Garner	1994	John Wiley & Sons Inc	http://images.amazon.com/images/P/002542730X.0...
1	3363	002542730X	0	Politically Correct Bedtime Stories: Modern Ta...	James Finn Garner	1994	John Wiley & Sons Inc	http://images.amazon.com/images/P/002542730X.0...
2	11676	002542730X	6	Politically Correct Bedtime Stories: Modern Ta...	James Finn Garner	1994	John Wiley & Sons Inc	http://images.amazon.com/images/P/002542730X.0...
3	12538	002542730X	10	Politically Correct Bedtime Stories: Modern Ta...	James Finn Garner	1994	John Wiley & Sons Inc	http://images.amazon.com/images/P/002542730X.0...
4	13552	002542730X	0	Politically Correct Bedtime Stories: Modern Ta...	James Finn Garner	1994	John Wiley & Sons Inc	http://images.amazon.com/images/P/002542730X.0...

	title	rating
38665	El conocimiento secreto de Jesucristo	1
22915	Charles & Diana : Prince & Princess Of...	1
6907	After the Cabaret	1
106791	Smoke Jumper (Choose Your Own Adventure, No 111)	2
110820	Stress Remedies: Hundreds of Fast-Relief Tips ...	1

	title	num_of_rating
0	A Light in the Storm: The Civil War Diary of ...	2
1	Always Have Popsicles	1
2	Apple Magic (The Collector's series)	1
3	Beyond IBM: Leadership Marketing and Finance ...	1
4	Clifford Visita El Hospital (Clifford El Gran...	1

user_id	254	2276	2766	2977	3363	3757	4017	4385	6242	6251	...	274004	274061	274301	274308	274808	275970	277427	277478	277639	278418
title
1984	9.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN
1st to Die: A Novel	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2nd Chance	NaN	10.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN	0.0	NaN
4 Blondes	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
84 Charing Cross Road	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	10.0	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Year of Wonders	NaN	NaN	NaN	7.0	NaN	NaN	NaN	NaN	7.0	NaN	...	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN
You Belong To Me	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN	0.0	...	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN
Zoya	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
\O\" Is for Outlaw"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN