sqrt(2) is rational?

Leave a comment

English: Foto of blackboard picture of a const...

logic is a funny thing. there’s syntax and there’s its meaning. totally independent.
it is quite frequent in maths that while syntax stays the same, meaning changes. the reason why syntax stays the same is: so you can make the same claims. write the same thing over and over again, only the meaning changes. this way the deep insight is shared that abstractly it’s the same thing.
for example numbers. we’re used to a decimal system, 10 digits. abstractly it’s the same as a binary system though. instead of denoting which number-system is meant, be sloppy. adding the b to every number over and over again is redundant. write once you’re talking of binary numbers, and everybody will understand.
only problem is when quoting someone. you’d need to repeat the definitions. so a non-mathematician might write 1+1=10 because someone wrote that. more correct would be to first define what the numbers mean. i.e. say: a number a_na_{n-1}\ldots a_1a_0 means \sum_{i=0}^n 2^i\cdot a_i

 this will be the 1+1=10 kind of proof. i.e. change the definition to get something unexpected.
the definition of real numbers based on rational ones is to look at limits of open rational intervals. every real number is a (non-unique) sequence of open intervals, each containing the next. and every such sequence which converges to an empty set represents some real number.
this real number then will be contained in every single interval of that sequence.

more exactly, every real number is a whole lot of such sequences, a whole equivalence class. two sequences are in the same equivalence class if eventually the intervals mutually contain eachother.
i.e. there exists a sub-sequence of one such that they’re inside the intervals of the other, at same index. and also the other way around, as both are open intervals, by definition containing points other than their border.

in a way real numbers are infinitesimal intervals, while rational numbers are actual points you can draw.
but then, \sqrt 2 is also a point you can draw! just take a  1×1 square, it will have diameter \sqrt{1^2+1^2}.
well, ability to draw means it’s constructible on paper with a ruler and compass. so, no, this can’t be the proof.
we need to show, \left(\frac pq\right)^2==2 or p^2==2\cdot q^2 for whole numbers p,q:
obviously p must be an even number, but then squaring it makes it divisible by 4. hence q must be an even number, and p^2 must be divisible by 8. and so on.
in the end both, p and q must contain all powers of 2. i.e. they must be infinite in size.

is that allowed? in the axioms of ZFC it isn’t. actually it isn’t the axioms on their own. what isn’t allowed is to create an expression that is infinite in length. in the language of logic this is clearly forbidden, you can’t have logic-expressions “converge” to something.
but why not? in fact, there are several approaches to create a set \sqrt 2 is contained in, as a point!

i.e. you can actually turn some real numbers into actual constructible points. apart from circle also other shapes could be used for construction. naturally construction of those other tools require some kind of approximation.
but then, if you restrict yourself to the 1-dimensional line, rational numbers are an approximation of their own already!
it’s possible to check if a rational number is what it claims to be. just use the compass to copy the rational number q times along the straight line. in the same way also other approximations can be verified for accuracy.

math isn’t just geometry. it’s also about formulas and functions. most prominent example is derivatives and their inverse, integration.
every formula has a formally definitive derivative. but formal integration is not always possible. so, while derivatives are constructible by formulas, integration is an approximation.
hence, real numbers created by integrals of formulas can be verified by derivative. but as language allows only finite expressions, these real numbers are countable.

English: Illustration of how the rational numb...

English: Illustration of how the rational numbers may be ordered and counted. Illustration contains a grid and an arrow, and two Swedish words (the words for “nominator” and “denominator”). (Photo credit: Wikipedia)

the notion of countable means: there exists a mapping onto the natural numbers.
for example whole numbers are countable: map every positive number to the odd numbers. negative numbers and zero are mapped to even numbers including zero.
a similar method works also for rational numbers. just interpret them as integer-coordinates on a half-plane.
then use some distance-function mapping them on natural numbers. and then add some ordering to points with the same distance.
(well, maybe this will skip some natural numbers. but it shouldn’t be a problem to re-index.)

no such strategy will work for infinite-length expressions. polynomials are countable, but power-series \sum_{i=0}^{\infty} a_i\cdot x^i aren’t. for every method to count them, one can create an element that isn’t counted.
i.e. a map from natural numbers to real numbers has no inverse! isn’t surjective!

thanks to this limitation of logic, only countably many reals are constructible. an uncountable amount remains as intervals.
in logic the constuction of a point cannot take up infinitely many steps. if it would, maths would be an intangible mess,
you cannot prove consistency by finite-size proofs then.

Gödel’s incompleteness theorem says in our mathematical logic you cannot prove consistency.
the proof starts out with the observation that expressions can be counted. same with proofs. then it goes on showing how this counting can be implemented in the language of logic.
the problem arises when you start proving stuff about these numbers. then you get into the same mess as with my proof above. some things require expressions of infinite length. but as history has shown, then inconsistency easily would creep in.
when dealing with infinity, a lot of additional information is required. literally, an infinite amount of information, infinite sets of axioms and such. talking of infinity is always alike to leaning out of a window: easy to lose your grip on the ground.

 

Advertisements

middleschool maths in a nutshell

Leave a comment

An example of a partial function that is not a...

An example of a partial function that is not a total function. (Photo credit: Wikipedia)

unfortunately you wont find this in a schoolbook. but truth is maths is very limited in middleschool.
one important concept you learn there, after basic set-theory, is the notion of a function.

given 2 sets, you “map” each element of the first set onto whatever element you choose from the 2nd.
no need for all elements of the 2nd set to have something mapped to it.
graphically you just draw arrows from the elements of the first set onto elements of the 2nd.
this way quite naturally a new set is created: the image of your function. that’s a subset of the 2nd set.
if you only map part of the elements of the 1st set onto something, you get a partial function.
a partial function quite naturally creates another set: the definition range. it’s a subset of the 1st set.
an actual (Total) function has domain (that’s what the 1st set is called) and definition range being one and the same set.
also functions are distinguished by the properties “injective” and “surjective“.
injective is a function where no element in the image has two or more “arrows” pointing to it.
surjective is when the image isn’t just subset of the set the function is mapping to. it’s when the full set is covered by the image.
bijective function has both properties, it’s injective and surjective. and therefore one can define an inverse function: with arrows pointing into the opposite direction.

a much better way to depict a function is to draw its values. visualize it so the viewer can predict what values it will have.
a common way to depict functions mapping points of a line or plane onto another line or plane, is to draw the points into a bigger coordinate system. that picture is called a Graph of that function.
just designate 1-2 coordinates for the domain, and draw a point in the remaining coordinates according to the function’s output.

if it’s a function from real numbers onto a 2d-plane, it’s more common to draw a curve in that plane.
maybe also add some arrows and markings to the curve to show the sequence in which the points get added.
of course, if you’d just draw the same function into 3d-coordinates as a graph, you’d get something different.
but just look at the 3d-image from the side into witch the coordinate is pointing which you did choose for the domain.
projecting the 3d-curve onto a plane orthogonal to that you get such plane with a 2d-curve in it. you get that drawing I described in the beginning of this paragraph. so it isn’t entirely different.

all this works great for smooth functions. the viewer can just imagine all the points inbetween the ones you draw.
one must take care that such points inbetween are what the viewer does expect though.
one must choose wisely what part to show, how much to magnify the function.
calculating the extrema of a function has, among others, the purpose of acquiring that knowledge.
so lateron in the text I’ll talk a bit about derivatives.

a function is just a glyph (or whatever decoration) along with 2 sets and some description.
this glyph is representing the function’s name.
as above the 1st set is the Domain, and the 2nd set is the value-range. the set of values the function might output, the image, is a subset of the latter.

you might remove a single element from the value-range, one that isn’t in the image. and technically you get a new function.
this way a function is more alike to a procedure in a computer-program.
also there altering the type of variables used as input or output, you basically get a new thing.
even when the code defining it hasn’t changed.

another similarity to computer-programming is the notion of a variable.

to write the “description” of a function mostly formulas are written.
but the description could also be a set of touples, representing that “arrow” you’d see when depicting the function as 2 sets connected by such a mapping.
a third possibility is to write some algorithm, alike to an actual computer-program. and sometimes the description will be just some plain text.

no matter how your function is described, in front of the description you see something alike to “f(x)=“.
there the “f” is the glyph used for the function, it might be a greek letter or a word, even hebrew letters might be used.
and the “x” is the variable, alike to variables in programming languages. again the variable is written as another glyph that might even be greek or hebrew and/or have various decorations.

more exactly a function might not be that new a concept at all, whenever we use some device we encounter that principle.
you just do something and something else you will get in return, input and output, cause and effect.
however, what really is new in middle-school is the idea of variables and formulas.
you have some text, written in whatever language, maybe computer-program, maybe plain text, maybe in the language of mathematics or logic.
in that text you have strange letters or whatever glyphs, maybe whole words, that somehow don’t make sense.
but in the context of describing a function those are meant to be seen as variables.
to the reader it means that their meaning will be defined lateron. for now there is some info on what they might contain though.
so when you have to evaluate a function, you read something alike “f(3)“, it means that in the function’s description that started with “f(x)=“, after the “=” you’ll have to replace x by the value 3, and then you’ll read that altered description again to learn what value the function will output.
and it gets even more complicated when you see the glyph used for the function as a variable too. maybe the function isn’t given in a defining way? maybe that function-name is part of a formula?
I say, learn programming! once you can do that, this aspect of maths shouldn’t be a problem.

well, that’s not all, there’s another concept to learn in middleschool, starting already at the beginning of mathematical education.
it starts out as multiplication. it continues with division and polynomials and their roots, and finally ends with trigonometry. all these things are really just about exponentiation.

the fundamental claim about natural numbers is that you just combine prime-numbers by multiplication to get everything.
sometimes the same prime number must be repeated several times, so you abbreviate this by exponentiation. for example 27=3^3=3\cdot3\cdot3.
this imposes a new operator onto the natural numbers. same operator can be extended to real numbers and complex numbers.
an operator, binary in this case (since it works on 2 variables), is just a function like above.
so in middle-school there are 3 binary operators: plus, times, and “to the power of”. one unary function there is too: \ln x or {_e\!\log x}
please note that subtraction and dividing are not listed because they are among those 3 operators!
to subtract you just need to multiply one number with “\text{-}1“. to divide you take something to the power of “\text{-}1“.

the important formulas are:
a-b=a+b\cdot(\text{-}1) and {a \over b}=a\cdot b^{\text{-}1}. and keep in mind (\text{-}1)\cdot(\text{-}1)=1 as well as (a^b)^c=a^{b\cdot c} and a^b\cdot a^c=a^{b+c} and a^c\cdot b^c=(a\cdot b)^c.
that’s just the beginning. in middle-school you also learn about roots, most prominent the square-root \sqrt a. but for each exponent there is a root inverting its exponentiation.
again it is no omission I didn’t list that together with the other 3 operators. the basic formula here is \sqrt[b] a=a^{1\over b}=a^{(b^{\text{-}1})}.

the brackets I put there because exponentiation differs in one important aspect from times and plus: it makes a big difference how you put the brackets when several exponentiation-operations are chained together.
i.e. a^{(b^c)}\ne(a^b)^c. so be careful and make use of brackets in such cases.
and also a^b\ne b^a explains why exponentiation shouldn’t be written as an operator alike to “^”.
we simply are used to swapping around the input to binary operators.

finally, 2D-trigonometry is handled by complex numbers in combination with exponentiation.
a complex number is just a term of the form a+b\cdot i and the rule that i\cdot i=i^2=\text{-}1.
quite prominent is the formula e^{\pi\cdot i}=\text{-}1 where e is the euler number.
now the euler number actually isn’t a number, it isn’t rational number and it cannot be expressed through polynomials or their solutions.
in that respect it is much alike to the number \pi, which in turn is merely the circumference of half a circle of radius 1.
so while \pi is described by approximating the half-circle’s circumference, e is described by (1+n^{\text{-}1})^n=({n+1\over n})^n with n being a fixed number as close to infinity as one can get.
but much more enlightening about exponentiation is the formula e^x=\sum\limits_{k=0}^{\infty}{x^k\over k!}=1+x+{x^2\over 2}+{x^3\over 6}+{x^4\over 24}+\cdots=1+x(1+{x(1+{x(1+{x(1+\cdots)\over 4})\over 3})\over 2}).
it is enlightening because it also works for x being a rational or complex number. actually this formula is where all the stuff about “roots are just exponentiation” or “dividing is same as to the power of -1” comes from.
this formula makes exponentiation into a function, an unary function, a function in a single variable.

to get totally minimalistic one could define a^b=e^{b\cdot\ln a} for positive numbers. when a is negative, think of it as a^b=(\text{-}1)^b\cdot(\text{-}a)^b=e^{\pi\cdot i}\cdot e^{b\cdot\ln a}=e^{b\cdot\ln(\text{-}a)+\pi\cdot i}.

This is a demonstration that Exp(i*Pi)=-1 (cal...

This is a demonstration that Exp(i*Pi)=-1 (called Euler’s formula, or Euler’s identity). It uses the formula (1+z/N)^N –> Exp(z) (as N increases). The Nth power is displayed as a repeated multiplication in the complex plane. As N increases, you can see that the final result (the last point) approaches -1, the actual value of Exp(i*pi).

sounds quite complicated.
but take a look at this formula: R\cdot e^{\varphi\cdot i}.
and now imagine \varphi to be the length of a small part of a circle, part of a circle with radius 1.
the output of this formula is a point on a circle of radius R and same angle as this small part of a circle \varphi did measure.

the output is a complex number.
a point in the complex plane. a plane made up of (x,y) for each complex number x+i\cdot y
and positive small angles this formula will map to the upper right quarter of that plane.
counterclockwise with angle \varphi=0 being mapped onto the positive half of the x-axis.

therefore one can imagine e^{b\cdot\ln(\text{-}a)+\pi\cdot i} as the formula (\text{-}a)^b=e^{b\cdot\ln a} rotated by half a circle, rotated by 180°.

so no actual exponentiation is needed, just the two functions \exp(x)=e^x and its inverse function \ln(x)={_e\!\log x}.
I repeat, it would be sufficient to have just plus, times as operators and \exp and \ln as unary functions.

and being minimalistic might sound like a funny useless game, but here it plays an important role for thinking abstractly:
the concept of dualities is quite prominent in maths. sometimes a duality is between 2 opposites, sometimes it’s between 2 similar things.
here we have both, a duality between the 2 operators, and a duality between a function and its inverse function.
additionaly there seems to be a duality-alike relationship between binary operator and unary function.

another unary function I already mentioned and even used above: \ln. the inverse function to exponentiation of the euler number (e^{\ln x}=x).
it also has the property that no matter what number you take to the xth power, you can still retrieve the original x with the help of \ln.

the way to do it is by taking advantage of the general formulas above and dragging them over to the \ln function.
this way you get \ln(a\cdot b)=\ln(a)+\ln(b) and \ln(a^b)=b\cdot\ln(a), useful formulas when coping with this function.
it’s because e^{\ln(a\cdot b)}=a\cdot b=e^{\ln(a)}\cdot e^{\ln(b)}=e^{\ln(a)+\ln(b)} and e^{\ln(a^b)}=a^b=(e^{\ln(a)})^b=e^{\ln(a)\cdot b}.
so when you have a^x=b given, you can get x by applying \ln to both sides: x\cdot\ln a=\ln b
that’s where the formula {_a\!\log x}={\ln x\over\ln a} comes from.
in computer-programming often {_2\!\log} is used instead of using \ln because the processor has some machine-language command built in for that and maybe not for \ln.
as you probably can see, the formulas I’ve proven here can be used for any \log-function, not just \ln!

another property of \ln is a bit beyond the scope of what one learns in middle-school.
in the complex plane, one can see that this function is actually defined everywhere, except on (0,0), but only locally.
i.e. given any complex number, nearby the function is defined everywhere.
you just are not allowed to include (0,0) and you are not allowed to have a hole in your definition range which would contain that point.
you always must leave a ray or a curve (which preserves the order for all of its absolute values) starting at that point and going into infinity. in this stripe the function would behave really strangely.
so there are not just all the different \log functions, each of them, including \ln, has different versions of their definition range, and the descriptions being altered accordingly.
this isn’t surprising when you know that exponentiation is not an injective function in the complex numbers.
thereby it isn’t bijective, no inverse function exists. so what values does it assume that wont make it injective?

let’s go back to the discovery that R\cdot e^{i\cdot \varphi} is counterclockwisely describing a circle of radius R.
obviously this means
e^{i\cdot \varphi}=\cos\varphi+i\cdot\sin\varphi and
e^{\text{-}i\cdot\varphi}=\cos\varphi-i\cdot\sin\varphi.
but why a circle? as I promised, the formula e^x=1+x+x^2\cdot(2!)^{\text{-}1}+x^3\cdot(3!)^{\text{-}1}+x^4\cdot(4!)^{\text{-}1}+\cdots, without witch using complex numbers for x wouldn’t make sense, that formula is an explanation.
observe what happens when you plug in i\varphi, and what happens when you plug in \text{-}i\varphi:
i goes through 4 states in this formula, i^1=i then i^2=-1, and the same again with a “-“.
i^5=i and later in the sum the same pattern repeats on and on.
so looking at those 4-5 terms is enough:
e^{i\varphi}=1+(i)\varphi+(\text{-}1)\varphi^2\cdot(2!)^{\text{-}1}+(\text{-}i)\varphi^3\cdot(3!)^{\text{-}1}+\varphi^4\cdot(4!)^{\text{-}1}\cdots
e^{\text{-}i\varphi}=1+(\text{-}i)\varphi+(\text{-}1)\varphi^2\cdot(2!)^{\text{-}1}+(i)\varphi^3\cdot(3!)^{\text{-}1}+\varphi^4\cdot(4!)^{\text{-}1}\cdots
pretty much the same, just the Imaginary part has become negative alltogether.

changing the sign of the imaginary part of a number (while leaving the real part as it is) is called conjugation.
it is described by drawing a line over the complex number. i.e. \overline{x+i\cdot y}=x-i\cdot y
therefore \text{-}i\varphi is conjugation of i\varphi. surprising is that also \overline{e^{i\varphi}}=e^{\text{-}i\varphi}=e^{\overline{i\varphi}}. thereby more generally \overline{e^z}=e^{\overline{z}}.

suppose e^{i\varphi}=x+i\cdot y. then also e^{\text{-}i\varphi}=x-i\cdot y
when you multiply both you get 1=e^0=x^2+y^2.
that’s the well-known formula for the circle of radius 1. so all points of e^{i\varphi} are on that circle!
think about it: x^2+y^2 are always constantly 1, no matter what \varphi you put into e^{i\varphi}!

exponentiation is well known for its strong growth, the bigger the real number the stronger the growth will be.
and it is well known that exponentiation will never be zero. but can it be negative?
obviously in the real numbers it can’t. it has positive values and is never zero. how can it cross the zero-point?
complex functions are difficult to depict, a function from 2d space to 2d space.
however, let’s say the input is a line in the complex plane, and output is the complex plane. the result is visible in 3d.

beware, this method will create a slightly biased picture. the line you choose determines how the thing will look like.
take a straight line, and you get a curve in 3d. this curve moves in space as you move the line on the complex plane.
all those parallel lines in the domain make up a surface. it would seem other lines could be mapped to the same surface.
the perpendicular line to those parallel lines would then seem like a curve in the plane orthogonal to the axis you choose for depicting the domain.
but this curve likely is not the graph of an actual function. it’s just a mapping of real numbers onto a plane, a curve.
also you could use the graph of a helper-function in the domain instead of a straight line. those helper-function-graphs can be moved along a straight line pointing into the direction you did use for depicting function-values of your helper-function.
no intersections will happen. so a whole new shape will be created, depending on that helper-function you choose.

for example the exponential function. in the direction of the real line we all know this function.
also I have said here that e^{i\varphi} is a mapping from real numbers onto a circle in a plane.
so that’s what you’d get mapping the exponential function with the axis of the real numbers as input:
an exponential graph rotated around the x-axis. each y-z-slice orthogonal to that x-axis is a full circle.
however, just look at what you’d get if choosing the imaginary axis instead of the real one.
e^{i\varphi} would become a curve spiraling on a cylindrical surface, never changing size.
parallel lines look the same, just different size. that size grows/shrinks exponentially.
and if you’d use a logarithmical graph as a helper-function the result might look even more differently.
i.e. \exp(\ln y+iy)=e^{\ln y}\cdot e^{iy}=y\cdot e^{iy}, similar shape but linear instead of exponential growth.

however, making a 3d-animation where the angle of the straight line used would rotate, could give a good impression.
unfortunately I haven’t seen any program that could display such a 3d-film, even less create one…

what you get this way is some mix between exponential function and \sin and \cos.
those trigonometric functions are definitely not injective, they periodically repeat themselves. so does \exp.

the real values it assumes are positive and negative numbers. since e^{i\pi}=\text{-}1, its square will be 1 again.
multiply e^{i\pi} another time and you get back to -1. in general e^{n\cdot 2\pi i}=1 for all whole numbers n.
this makes \exp non-injective, each and every value gets assumed infinitely many times.
but things aren’t that bad. knowing \ln for a truncated \exp function is enough. so just cut off all the values that repeat and define the domain accordingly for a total function.
this way usually \ln is defined for all numbers except zero and a ray starting in zero going along the negative real numbers.

in the formula-collection you’ll find the definition:
\ln x=2\cdot({x-1\over x+1}+({x-1\over x+1})^3\cdot 3^{\text{-}1}+\cdots)=2\cdot\sum\limits_{n=0}^\infty({x-1\over x+1})^{2n+1}\cdot(2n+1)^{\text{-}1}) for x>0
there also are formulas for 0<x<2, and they might look a lot more simple.
the problem lies in how they got created: pick a point and you’ll get it defined within a circle around that point.
since it cannot be defined at zero, such attempts will always give very limited results…

for a function to calculate the negative half, just take such a \ln\text{-}x and add \pi\cdot i to its output, thereby rotating by 180°. i.e. \widetilde\ln x=\pi\cdot i+\ln\text{-}x will be then defined for x<0.
just use \ln and \widetilde\ln depending on where you are looking at.
so in addition to \sin\varphi=\text{Im} e^{i\varphi}={e^{i\varphi}-e^{\text{-}i\varphi}\over 2i} and \cos\varphi=\text{Re} e^{i\varphi}={e^{i\varphi}+e^{\text{-}i\varphi}\over 2}, we also now can write \arcsin x={\ln(ix+\sqrt{1-x^2})\over i} and \arccos x={\ln(x+\sqrt{x^2-1})\over i}

but forget trigonometric functions! do everything 2-dimensional directly on a calculator (a calculator with the ability to calculate complex numbers).
type R\cdot e^i\varphi to get a point with distance R and angle \varphi, translate Polar Coordinates to Cartesian Coordinates.
use \ln(x+iy) to calculate \ln R and \varphi. the former in the real part, the latter in the imaginary part of the output.
add to that the knowledge that scaling up a triangle will also scale up each of its lines by the same factor.
maybe some pythagoras (x^2+y^2=R^2) and you have trigonometry covered.

a bit more subtle is the idea of calculating the derivative for a function. what is it for?
abstractly seen the process of creating a derivative, Differentiation,  is just a function.
it takes another unary function as input and outputs a function in the same variable.
it is written by drawing a small vertical line above and next to the function, or just a dot above.
it transforms the function according to certain rules.

since there only are 2 operators and 2 functions, defining derivative is easy:

every term that is connected by plus to other terms gets handled individually. the derivative of a finite sum is the sum of individual derivatives. (a(x)+b(x))'=a'(x)+b'(x)
product are more complicated. a product becomes the sum of the same amount of products of same size. and additionally in each such term of the output a different term of the product is picked out and the derivative of that is calculated — the other terms of the product stay the same as without derivative. (a(x)\cdot b(x))'=a'(x)\cdot b(x)+a(x)\cdot b'(x) or (\prod\limits_{k=1}^N a_k(x))'=\sum\limits_{k=1}^N(\prod\limits_{l=1}^{k-1}a_l(x))\cdot a_k'(x)\cdot(\prod\limits_{l=k+1}^N a_l(x))
the variable x to the power of a constant becomes that constant times x to the power of a number that is the constant minus one. (x^c)'=c\dot x^{c-1}
the euler-number to the power of x is already its own derivative, nothing changes. (e^x)'=e^x
the derivative of \ln x is x to the power of -1. (\ln x)'=x^{\text{-}1}
a function evaluated at the output of another function is the product of the derivatives of both functions. keep in mind that the outer function must first get the derivative applied before you insert the 2nd function into its variable. (a(b(x)))'=a'(b(x))*b'(x)

another way to write the derivative is f'(x)= {\partial f\over\partial x}(x).
in this last rule, let’s say b(x) is inside of the variable u.
then in this new way of writing the rule would like this: {\partial\over \partial x}a(b(x))={\partial a\over\partial u}\cdot{\partial u\over\partial x} (x)

but in case you come upon a function that isn’t those 4 functions combined someway, there is a much more general definition.
take a look at {f(x)-f(x_0)\over x-x_0} and imagine x to be very close to x_0.
when you draw the graph of f (input in the x-coordinate, output in y), and you draw a graph of x\cdot f'(x_0) (a line through zero), you will notice in x_0 they both have exactly the same growth.
the reason is that {y-d\over x}=k is the same as the k in kx+d, the line through (0,d).
so, when f(x_0)=d and x_0=0, then {f(x)-f(x_0)\over x-x_0}={f(x)-d\over x}.
for each x a line is depicted, an angle is chosen. the closer x comes to x_0, the better such a line approximates the actual function’s angle in that point.

one thing I should say about the geometry of linear functions:
all linear functions are of the form y=kx+d or implicitly ax+by=c.
both are the same, k=\text{-}{a\over b} and d={c\over d}.
however the most important knowledge here is a completely different way of writing linear functions:
a linear function that depicts a plane in 3D or line in 2D is a function in 2 respectively 1 variables.
the surrounding space however has one additional direction, so there is a line perpendicular to the function.
for 2D this perpendicular line is determined by \begin{pmatrix} y \\ \text{-}x \end{pmatrix} for a vector \begin{pmatrix} x \\ y \end{pmatrix}.
in 3D there’s an operator commonly written as “\times“, the “outer product” is how I learned it’s called.
just take any 2 vectors of the plane, apply that operator and you get the perpendicular vector.
\begin{pmatrix} a_x \\ a_y \\ a_z \end{pmatrix}\times\begin{pmatrix} b_x \\ b_y \\ b_z \end{pmatrix}=\begin{pmatrix} {\det\begin{pmatrix} a_y & b_y \\ a_z & b_z \end{pmatrix}} \\ {\det\begin{pmatrix} a_z & b_z \\ a_x & b_x \end{pmatrix}} \\ {\det\begin{pmatrix} a_x & b_x \\ a_y & b_y \end{pmatrix}} \end{pmatrix} with \det \begin{pmatrix} a_x & b_x \\ a_y & b_y \end{pmatrix}=a_x\cdot b_y-b_x\cdot a_y
once you have the perpendicular vector \vec q and a point on the surface/line \vec p, use the inner product (multiply each pair of components and sum them up) to create the formula:
\vec v\cdot\vec q=\vec p\cdot\vec q where \vec v=\begin{pmatrix} x \\ y \\ z \end{pmatrix} respectively \vec v=\begin{pmatrix} x \\ y \end{pmatrix}
when p is zero, it is obvious why this works: the inner product is zero when both vectors are perpendicular to eachother.
but more generally \vec a\cdot \vec b=\|\vec a\|\cdot\|\vec b\|\cdot\cos\angle(\vec a,\vec b)

when you look at a plane with a point p which isn’t the point closest to zero, then there will be a whole circle of points with exactly the same distance to zero.
also this whole circle will always have the same angle to the orthogonal vector.
so in this formula the distances are all the same as well as the \cos, throughout the whole circle.
therefore the inner product is constant on that circle.
remember, \vec a\cdot\vec b=\sum\limits_{k=1}^{\dim a}a_k\cdot b_k is linear in each component, multilinear!
since the equation \vec v\cdot\vec q=\vec p\cdot\vec q is linear and there are more than 2 points of the plane fulfilling it, that’s also the equation for all the other points.
i.e. as \cos\angle(\vec v,\vec q) increases, the distance decreases to compensate that. and the other way around.

for example we get \begin{pmatrix} x \\ y \end{pmatrix}\cdot\begin{pmatrix} \text{-}f'(x_0) \\ 1 \end{pmatrix}=\begin{pmatrix} x_0 \\ f(x_0) \end{pmatrix}\cdot\begin{pmatrix} \text{-}f'(x_0) \\ 1 \end{pmatrix} is tangential to the function in x_0.
subtract the x-term: y=f'(x_0)\cdot x+(f(x_0)-x_0\cdot f'(x_0))

there are many applications for this. most prominently you can figure out minimum and maximum of a function.
it wont tell you where those are, but definitely shrinks down the set of candidates to several points.
minimum and maximum can only happen at the border, or when there is no derivative at all, or when the derivative is zero.
derivative says something about the growth of a function. before it will start to grow or before it will start to decline, the function will go through a point where growth is zero.
alternatively the growth might make a jump, from one value to another. in the point where it jumps, the derivative does not exist, that’s a candidate for minimum or maximum too.
in a 3D function, with 2 axes reserved for domain, the edge is again a 2D function. there could be a maximum or minimum anywhere, you’d have to calculate this derivative (on the edge) to know where.

this was, in a nutshell what I learned in middle-school and only grasped at uni.
I didn’t talk of probability and integration (the inverse function to derivatives).
those I didn’t learn before uni, not from my teachers. it’s in the math-books though.
but based on above, it shouldn’t be too difficult.

if you’re an advanced mathematician, it is interesting to see here how little people in middle-school learn.
if you haven’t finished middle-school yet, or are re-learning all that stuff, good luck.
all those things are not difficult, although I might have made things seem difficult.
my goal with this posting was to show what abstract concepts there already are at middle-school.
and I wanted to point out how big an advantage it was to think abstractly in middle-school already.
as always I’m open to any critique or suggestions…