Jump to content

LIMIT

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
14 replies to this topic

#1
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts
As I was learning about using FULLTEXT searching (long story, and YES, I do need it)

I came across this comment in the MySQL Manaul

Quote

More about performance:
Fulltext search in MySQL isn't slow really. It's slower than normal index selects, but that's not noticable.
I'm playing tables that each contains ca 4million records, about 6GB of text that needs to be indexed. The problem lies in the optimization of the queries, that use the MATCH() and AGAINST() functions. So far as I found out MySQL can use in a query only one index. The optimizer looks for the index that will possibly give the smallest amount of rows, then goes trough all of these, and removes those records, that fall out because of the other WHERE statements.
When you enter a fulltext search, the server _has_ to use that index. Any other statments could be applied only to the rows that were returned by the search. If you have lots of records, most of the searches will return lots of results. The rest of your query will be executed like if you were executing it against a table that contains the results, and no indexes - obviously that will be processed sequentially.
Same thing applies when you use the SQL_CALC_FOUND_ROWS, or LIMIT with high offset values: first the server has to load all the results, then the limit could be applied.
If you _have_ to use the fulltext, these are some hints how you could get quite good results:
- Try to avoid any group, where, order and any statements for what it's necessary to get all the results. I know, usually this is impossible.
- If you need to show the results on a web page, try to find other methods to get the total number of results than using SQL_CALC_FOUND_ROWS. Better if you don't tell the user at all how many results there are ;)
- If indexing text that is in other language than english, before doing anything create a stopword file for your language! (That could reduce index size about 30%)
But most important: think a lot, before you decide to use fulltext search!

The bit that caught my eye was this ...

Same thing applies when you use the SQL_CALC_FOUND_ROWS, or LIMIT with high offset values: first the server has to load all the results, then the limit could be applied.


So, that means that when I am on page 15, with 10 records per page - the Server has to get 160 records, via a fulltext search and discard 150 of them ?

Yikes @ fulltext search across 5 columns each time they page up or down.

Would I not be better off building an array of the ID's then slicing the array, as you would use LIMIT within MySQL and doing a

SELECT * from table WHERE ID = $ID

ID being it's usual unique indexed key that we all use.

From my array of ID's rather than send the server off to get a load of records, just to discard them each time ?

Now, I've not even got LIMIT to work in this query - but it has been suggested that I should use it ..

My query currently looks like this ...


	$query= "SELECT * FROM $table WHERE MATCH $fulltext AGAINST '".mysql_real_escape_string($test) ORDER BY Master_Group,Sub_Group,Description,Dimensions"'";   

$table = table to be searched.

$fulltext is the list of columns to be searched using the index made by invoking a FULLTEXT index on the columns I want to check.

$test is what I'm looking for.

I am proposing to do this :


    $rs=mysql_query($query) or die(mysql_error());

	 // populate $result    

$count = 0;

while ($row = mysql_fetch_array($rs)) {

	 $result[$count] = $row['ID'};    

	 $count = $count + 1; // Total Number of rows returned 

}	 



So, this way I would take the 'hit' on the SELECT once - build my Array of the ID numbers, then wander off and slice it in the same way you would slice using LIMIT. to get the individual records.

I haven't even got LIMIT to work with the search query I'm using - Should I battle on, or is my method acceptable ?

Your thoughts on the matter would be gratefully received.
Thanks, Phill.

#2
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts
Having done some further looking into this, it SEEMS that getting the array of ID's then doing a

SELECT * FROM $table with ID = $ID

Is more efficient as the query then uses the primary, indexed, unique key of MySQL and should, therefore, be pretty darn quick, as I am using an search that the guys & gals who wrote MySQL optimised for speed of search.

Again, I await your comments - I'm fairly new to all this stuff !!!

Regards,

Phill.

#3
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
You better believe queries of particular indexed IDs
are many times faster than repeated full text searches. ;)

In my opinion (everyone has one right), find an alternative
to your full text search query as they are inefficient and notoriously
slow, much like using cursors instead of sets.

If your looking for more optimizations, give more info...
How many records are in the database?
How many words (on average) is your full text search variable?
How many records do you expect in your result set?

but it sounds like you already resolved your own problem
and learned something about databases while at it
good work. :thumbup1:

#4
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts

debtboy said:

You better believe queries of particular indexed IDs
are many times faster than repeated full text searches. ;)

In my opinion (everyone has one right), find an alternative
to your full text search query as they are inefficient and notoriously
slow, much like using cursors instead of sets.

If your looking for more optimizations, give more info...
How many records are in the database?
How many words (on average) is your full text search variable?
How many records do you expect in your result set?

but it sounds like you already resolved your own problem
and learned something about databases while at it
good work. :thumbup1:
What can I say ? - Thanks ..

The thread

[other] MySql Injections - Ubuntu Forums

explains why I cannot see a better way to do the search.
It also throws into the mix stuff about preventing MySQL injections etc. A lot of it I 'sort of' follow - I've chosen

$query= "SELECT * FROM people WHERE name='".mysql_real_escape_string($name)."'";

And, we've had debates on magic quotes etc....

If you take the time to read the thread, I will be putting forward a script for FULLTEXT, and once everyone has decided whether to use LIMIT or an array of ID's I will allow them to continue arguing, and just use ID's as the MySQL manual seems to point towards a MAJOR hit if you keep using LIMIT.


Again, thanks for your reply,

Phill.

#5
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts

debtboy said:

You better believe queries of particular indexed IDs
are many times faster than repeated full text searches. ;)

In my opinion (everyone has one right), find an alternative
to your full text search query as they are inefficient and notoriously
slow, much like using cursors instead of sets.

If your looking for more optimizations, give more info...
How many records are in the database?
How many words (on average) is your full text search variable?
How many records do you expect in your result set?

but it sounds like you already resolved your own problem
and learned something about databases while at it
good work. :thumbup1:
thanks debtboy, it is, as they say, a "long story" .. to find out why I needed fulltext search and my origins of security on it, it is here ...

[other] MySql Injections - Ubuntu Forums

If you take the time, you will see that I have learned a lot & been told to go look at different ways of doing things.

It was when I needed to do 'FULLTEXT" that I read the comments on the manual page of the function and tried to understand what they meant.

Having got as far as I could on the security issue - I then wanted to split the modules so as not to be doing the SELECT time after time.

I dunno, maybe I'm doing it all wrong (quite probable), but, having taken the time to read the stuff in the manual & comments, is my way more efficient for a database with a SELECT that returns under 4Gb?

Everyone but, everyone me wants to use LIMIT - I have clearly seen that so doing makes call upon call on the MySQL server.

I just cannot get 'my head' round the logic of sending to the server all those requests, time after time. when it has already done them once.

In my little, voiceless, opinion - do the search, get the ID's in an array and then let PHP get on with it, only troubling the MySQL server for 10 records per time you want to page up / down & pass them to HTML parsing.

My current module actually holds ALL the fields required to build the table within the array $result, thus there wouldn't even be a 2nd call to the MySQL server.

I just can't work out why the norm is to keep calling the MySQL server to search for everything each time & then throw the majority away, instead of just asking it once and then looking after it afterwards.

I hate to say this - but it looks like lazy programming to me.

<<<----- Lights the Blue touch-paper & retires to a safe distance ------>>>

Phill.

#6
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
Phillw,
This is a great thread... :thumbup1:

I myself use TOP (equiv of LIMIT) quite often in queries, but the
offsets aren't high and the datasets are relatively small.

I, like many others are of the mindset...
Let the database do most if not all of the processing as the
dbms is usually installed on much more capable hardware
and designed to do just that, whereas the web server is
much better at dealing with input and output. For this reason
I believe you should run the full text query once and only
extract IDs for the array not all values.
When it comes to paging...
I would dynamically create a new query each time by looping
through the array and appending values to the query
using "OR" statements.

SELECT Field1, Field2, Field3
FROM  Your_Table 
Where (ID = $array_value1
OR ID = $array_value2
OR ID = $array_value3
etc... )

but, that's just me and if holding all values in a large
array works for you than good.

I did read the post from the link you referenced and
found it informative as I have a lot to learn about
sql injection. Most of my work is on MSSQL server
or Oracle and I normally just escape the quotes.
I also don't have as many concerns as my environment
is a corporate intranet as opposed to the internet.

Good Luck with the project,
you site looks pretty good!!

#7
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts
Well, the more I read the more I realise that the use of LIMIT is what was learned in skool.
Thus, everyone believes it is THE answer. A quick / dirty/ can't be bothered with answer.
It is only when they finally are faced with FACTS do we seperate the "flat earth society" from those who accept that what they learned under V3.x of MySQL has been improved upon - I think it's called evolution. LIMIT stuff was introduced a while back - One to make you smile :
AT&T Unix, uNiverse as SQL ...
We had a db of c 500,000,000
Small fry back in the 1980's - lol
The developers were "well, sort of, yeah it should be okay"
Resize and re-index ..... Ermmmm ... well... Is it throwing any errors out ?

I've always been one to "throw a spanner" in the works - Just because

Quote

We've always done it this way
does not mean that it is the best way to do it.

The more I have read about LIMIT, the more it seems SO, SO wrong.

Phill.

#8
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts
:: SIGH::

Well it seems, when challenged, the

Quote

We've always done it this way

Have fallen silent.

So, I guess they're going to write code that hammers the server, provided they get output that the can parse to XHTML (Strict)

In amongst my discussions on various forums from the checking of the injection risks, to the CSS, to the XHTML etc. I've have, indeed, learned a lot.

I look forward to doing some "time trials" to see the timings of what I propose against what what is generally suggested.

I have a sneaky feeling that the parsing of the output will be a greater overhead than the getting by 'ID' the records once we get to page 5.... It may even be faster by page 3 ;)

I did like the option of not telling the user how many records were returned, but as my count has to fill the array in the 1st instance, I'll have that already without having to do a ROWS-RETURNED query :-P

So, is it populate with the information to display and then pure PHP, or do I just hold the 'ID' and then do a SELECT by ID via MySQL .... Decisions, decisions...

One thing for certain .... bye bye LIMIT :rules:

Phill.

#9
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts
Hi debtboy ... Just to let you know - I've gotten frankenstien's monster working. What started off as a little search module has taken on a life all of its own - lol

The advice that I've gleaned from all the contributers has been fantastic. When I'm happy with it all, I'll post how I did it - then everyone can argue over it ;-)

The rules it uses are ... if >3 characters use fulltext search - if 3 characters do column search on Description & Suitable for.

As you would have read in the initial thread on ubuntu, I have decided to go with paramtertised input for those pesky fractions of an inch.

The ID numbers of the returned results are stored in an array, with the display page just doing SELECT * from $Table with ID=$Array[$count]; as a do..while loop for offsetting for page-up / page-down.

The test area is currently in 'debug' mode - so, if you'd like to have a look and see how it returns the search string in different circumstances, it is displaying the search query that it has carried out to create the results.

To get some test data to try searches with, the following pages are populated with data.
Ignition and Door Lock Keys, Malleable Iron Fittings, Flexible Rubber Mountings, Oils and Lubricants, Pin Spanners, Stihl TS400, Partner K650/K700, and Dolmar DPC6400.

the link to my test area is © 2009 M.G. Judd Ltd Front Page You're welcome to have a 'play' with differing searches to see how it behaves.

Regards,

Phill.

#10
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
Great to hear you have it working.
I'm positive it's very efficient also. :)
Good Work!!

#11
phillw

phillw

    Learning Programmer

  • Members
  • PipPipPip
  • 82 posts

debtboy said:

Great to hear you have it working.
I'm positive it's very efficient also. :)
Good Work!!

I now just await for the CC admin staff to allow my 'tutorial' on the joys of fulltext searching - If they do, I look forward to your comments on my logic tree of how I filter things through.

And, once again, thanks for your assistance on this, I'm glad that the discussions about mysql injections / paramtising etc. were of interest / help to you.

Regards,

Phill.

#12
debtboy

debtboy

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 916 posts
Definitely a huge help to me. :thumbup:
I create a lot of custom web based reports
from Oracle, Sql Server, Unisys and a few flat files.

Some of these DBs are rather large
(one more than 5 million, another more than 10 million records)
so query optimization is paramount!!

THANKS!!