Jump to content

Finding things in common

- - - - -

  • Please log in to reply
8 replies to this topic

#1
Andy McCall

Andy McCall

    Newbie

  • Members
  • PipPip
  • 20 posts
Basically here's my problem I need to search a large list of users who themselves have a library of data that I am interested in. What I need to effectively be able to do is find say... 10 users who have the most data in common with me (the intersection of our libraries). What is clear to me is that I cannot iterate through the collection of users and compare my library to everyone else's because I will have a large library and there will be many users who may also have a large library. My question is this: Has anyone ever solved a problem like this before (and would be willing to give me information about it), or would you have any suggestions about concepts/algorithms that could help me solve this. I have a strategy in mind however I'm hoping you guys could give me new perspective.

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
Are you using unique IDs for the libraries? It should be a pretty simple SQL query, if done well.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Andy McCall

Andy McCall

    Newbie

  • Members
  • PipPip
  • 20 posts
ok to give a little more information about the problem. The program itself is a music recommender service. Furthermore, my knowledge of sql is essentially none, so please bare w/ me. In theory though, all users would be have unique id's and all the songs (data in the library) themselves would have unique id's. Does this simplify things for me?

---------- Post added at 08:33 PM ---------- Previous post was at 08:31 PM ----------

BTW as a preemptive response I would love to know big O of any solutions suggested if possible.

#4
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
I would suggest you start here: SQL Tutorial

The big O depends a LOT on the database design, the definition of a "match" (which depends on how you compare music), etc. You can keep it fairly low, however.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#5
Andy McCall

Andy McCall

    Newbie

  • Members
  • PipPip
  • 20 posts
ok so I looked over the tutorial and I have an idea about a possible implementation. Before I get into that though, can a cell inside a table be another table?

Back to my implementation table of user's that contained a user id and all information associated w/ that user. furthermore, I had a table of songs that allowed duplicate songs and every song listed would have a user id associated with it. Using the INNER JOIN that is in the tutorial I could pick out users that are associated w/ the given song. If i did that for every song in my library I'd end up w/ several tables of users. Then I could perform a UNION ALL and get all the users who share similar libraries to me. Then from that I could select the users that appear most frequently. That would probably work, but how long would that take If I have thousands of songs. Furthermore, It seems a tad memory intensive, not that I couldn't overlook memory intensive if it saves time.

#6
Andy McCall

Andy McCall

    Newbie

  • Members
  • PipPip
  • 20 posts
I believe the problem I trying to solve is called feature selection? right?

---------- Post added at 03:43 PM ---------- Previous post was at 03:33 PM ----------

@WingedPanther:
I took a look at the tutorial you posted about SQL. The only idea I could come up with after reading it is I could construct to tables: one for songs and one for users. The user table would have his unique id all applicable information about the user. Then I could construct a table of songs (allowing duplicates) each row would also have a user ID. The table of songs would then be populated with every song from every user. By that I mean If I had song "A" and you had song "A" in your library there would be 2 entries for song A in the table of songs. Then when I attempt to rank users I would use the join command for every song in my library the result being several tables of Users that have songs in common w/ me. Then I would unite all the tables and I would have all the user's I would be interested in.

#7
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
What you just described sounds like a many-to-many relation: -- 1 song can belong to many users, and 1 user can have many songs.
This is nearly always solved in a database by using a join-table like so:

Table song: ID, song_name
Table user: ID, user_name
Table song_user: song_id, user_id

Song sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]song_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]B[/TD]
[/TR]
[/TABLE]

User sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]user_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]tom[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]mike[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]john[/TD]
[/TR]
[/TABLE]

song_user sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]song_id[/TD]
[TD]user_id[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]3[/TD]
[/TR]
[/TABLE]

With this data song "A" belongs to tom and mike, song "B" belongs to tom, mike and john.

#8
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
You will want to understand self-joins. You can do things like,

SELECT COUNT(*), user1.user_id, user2.user_id from song_user user1 inner join song_user user2 on (user1.song_id = user2.song_id and user1.user_id <> user2.user_id)
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#9
Andy McCall

Andy McCall

    Newbie

  • Members
  • PipPip
  • 20 posts
Hey guys I really appreciate the help I'll try to show this thread to my group and get back to you guys if anymore questions arise.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users