Basically here's my problem I need to search a large list of users who themselves have a library of data that I am interested in. What I need to effectively be able to do is find say... 10 users who have the most data in common with me (the intersection of our libraries). What is clear to me is that I cannot iterate through the collection of users and compare my library to everyone else's because I will have a large library and there will be many users who may also have a large library. My question is this: Has anyone ever solved a problem like this before (and would be willing to give me information about it), or would you have any suggestions about concepts/algorithms that could help me solve this. I have a strategy in mind however I'm hoping you guys could give me new perspective.
8 replies to this topic
#1
Posted 21 January 2012 - 05:39 PM
|
|
|
#2
Posted 21 January 2012 - 06:11 PM
Are you using unique IDs for the libraries? It should be a pretty simple SQL query, if done well.
#3
Posted 21 January 2012 - 06:33 PM
ok to give a little more information about the problem. The program itself is a music recommender service. Furthermore, my knowledge of sql is essentially none, so please bare w/ me. In theory though, all users would be have unique id's and all the songs (data in the library) themselves would have unique id's. Does this simplify things for me?
---------- Post added at 08:33 PM ---------- Previous post was at 08:31 PM ----------
BTW as a preemptive response I would love to know big O of any solutions suggested if possible.
---------- Post added at 08:33 PM ---------- Previous post was at 08:31 PM ----------
BTW as a preemptive response I would love to know big O of any solutions suggested if possible.
#4
Posted 22 January 2012 - 07:13 AM
I would suggest you start here: SQL Tutorial
The big O depends a LOT on the database design, the definition of a "match" (which depends on how you compare music), etc. You can keep it fairly low, however.
The big O depends a LOT on the database design, the definition of a "match" (which depends on how you compare music), etc. You can keep it fairly low, however.
#5
Posted 22 January 2012 - 09:39 AM
ok so I looked over the tutorial and I have an idea about a possible implementation. Before I get into that though, can a cell inside a table be another table?
Back to my implementation table of user's that contained a user id and all information associated w/ that user. furthermore, I had a table of songs that allowed duplicate songs and every song listed would have a user id associated with it. Using the INNER JOIN that is in the tutorial I could pick out users that are associated w/ the given song. If i did that for every song in my library I'd end up w/ several tables of users. Then I could perform a UNION ALL and get all the users who share similar libraries to me. Then from that I could select the users that appear most frequently. That would probably work, but how long would that take If I have thousands of songs. Furthermore, It seems a tad memory intensive, not that I couldn't overlook memory intensive if it saves time.
Back to my implementation table of user's that contained a user id and all information associated w/ that user. furthermore, I had a table of songs that allowed duplicate songs and every song listed would have a user id associated with it. Using the INNER JOIN that is in the tutorial I could pick out users that are associated w/ the given song. If i did that for every song in my library I'd end up w/ several tables of users. Then I could perform a UNION ALL and get all the users who share similar libraries to me. Then from that I could select the users that appear most frequently. That would probably work, but how long would that take If I have thousands of songs. Furthermore, It seems a tad memory intensive, not that I couldn't overlook memory intensive if it saves time.
#6
Posted 22 January 2012 - 01:43 PM
I believe the problem I trying to solve is called feature selection? right?
---------- Post added at 03:43 PM ---------- Previous post was at 03:33 PM ----------
@WingedPanther:
I took a look at the tutorial you posted about SQL. The only idea I could come up with after reading it is I could construct to tables: one for songs and one for users. The user table would have his unique id all applicable information about the user. Then I could construct a table of songs (allowing duplicates) each row would also have a user ID. The table of songs would then be populated with every song from every user. By that I mean If I had song "A" and you had song "A" in your library there would be 2 entries for song A in the table of songs. Then when I attempt to rank users I would use the join command for every song in my library the result being several tables of Users that have songs in common w/ me. Then I would unite all the tables and I would have all the user's I would be interested in.
---------- Post added at 03:43 PM ---------- Previous post was at 03:33 PM ----------
@WingedPanther:
I took a look at the tutorial you posted about SQL. The only idea I could come up with after reading it is I could construct to tables: one for songs and one for users. The user table would have his unique id all applicable information about the user. Then I could construct a table of songs (allowing duplicates) each row would also have a user ID. The table of songs would then be populated with every song from every user. By that I mean If I had song "A" and you had song "A" in your library there would be 2 entries for song A in the table of songs. Then when I attempt to rank users I would use the join command for every song in my library the result being several tables of Users that have songs in common w/ me. Then I would unite all the tables and I would have all the user's I would be interested in.
#7
Posted 23 January 2012 - 01:18 AM
What you just described sounds like a many-to-many relation: -- 1 song can belong to many users, and 1 user can have many songs.
This is nearly always solved in a database by using a join-table like so:
Table song: ID, song_name
Table user: ID, user_name
Table song_user: song_id, user_id
Song sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]song_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]B[/TD]
[/TR]
[/TABLE]
User sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]user_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]tom[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]mike[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]john[/TD]
[/TR]
[/TABLE]
song_user sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]song_id[/TD]
[TD]user_id[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]3[/TD]
[/TR]
[/TABLE]
With this data song "A" belongs to tom and mike, song "B" belongs to tom, mike and john.
This is nearly always solved in a database by using a join-table like so:
Table song: ID, song_name
Table user: ID, user_name
Table song_user: song_id, user_id
Song sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]song_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]A[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]B[/TD]
[/TR]
[/TABLE]
User sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]id[/TD]
[TD]user_name[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]tom[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]mike[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]john[/TD]
[/TR]
[/TABLE]
song_user sample data:
[TABLE="class: grid, width: 350"]
[TR]
[TD]song_id[/TD]
[TD]user_id[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]3[/TD]
[/TR]
[/TABLE]
With this data song "A" belongs to tom and mike, song "B" belongs to tom, mike and john.
#8
Posted 23 January 2012 - 07:47 AM
You will want to understand self-joins. You can do things like,
SELECT COUNT(*), user1.user_id, user2.user_id from song_user user1 inner join song_user user2 on (user1.song_id = user2.song_id and user1.user_id <> user2.user_id)
SELECT COUNT(*), user1.user_id, user2.user_id from song_user user1 inner join song_user user2 on (user1.song_id = user2.song_id and user1.user_id <> user2.user_id)
#9
Posted 23 January 2012 - 10:39 AM
Hey guys I really appreciate the help I'll try to show this thread to my group and get back to you guys if anymore questions arise.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account


Back to top









