Monday, May 12, 2014

BICAM SLAM

Joan Sola talks in his article on BICAM SLAM explaining the benefits of mounting mono-vision cameras on SLAM robots equipped for stereo cameras. He summarizes saying " By using monocular algorithms on both cameras, the advantages of mono-vision (bearing-only, with infinity range but no 3D instant information) and stereo-vision (3D information only up to a limited range) naturally add up to provide interesting possibilities, that are here developed and demonstrated using an EKF-based monocular SLAM algorithm. Mainly we obtain: a) fast 3D mapping with long term, absolute angular references; b) great landmark updating flexibility; and c) the possibility of stereo rig extrinsic self-calibration, providing a much more robust and accurate sensor. Experimental results show the pertinence of the proposed ideas, which should be easily exportable (and we encourage to do so) to other, more performing, vision-based SLAM algorithms."  Basically this means that mono-vision cameras, which have a flat lens allow for an extended field of view.  Typically these mono-vision cameras are uses singularly and have the issue of having a difficult time with depth perception.  However the benefits to mono vision cameras include no lens distortion (of which long range cameras provide) resulting in with absolute angular references(more accuracy) and a larger field of view.  This field of view increases efficiency immensely.  The reason is more of a view the camera can get from a single screenshot, the less it needs to move and recalibrate.  This also results in less of a need to correct errors between screenshots, resulting in once again, more accuracy.  Not with a second mono-vision camera providing stereo vision with flat lenses, a robot is able to use its second camera as a point of reference to calibrate itself to get a better feel for depth. Now by far the biggest benefit mono-vision cameras have is their superior visual range to stereo cameras.  Sola goes into this stating "The drawback of stereo-based systems is a limited range of 3D observability (the dense-fog effect: remote objects cannot be considered), and that they strongly depend on precise calibrations to be able to extend it.". This means that although typical stereo cameras can look a decent maximum distance, the preparation work for capturing a scene an maximum distance is quite tedious. In Sola's writings he also goes over how the identification of landmarks will occur by the camera stating “As a general idea, one can simply initialize landmarks following mono-vision techniques from the first camera, and then observe them from the second one: we will determine their 3D positions with more or less accuracy depending on if they are located inside or outside the stereo observability region.”. In layman's terms, this means the cameras have a slave master relationship in regards to identifying landmarks. The master camera identifies landmarks throughout its field of view, followed by the second camera attempting to work with the first camera to decide were the two are in regards to a 3d environment. The best part about these two cameras working together, is there is no radial distortion(the distortion causes by long range cameras) to skew the inputs. I see a bright future for Bi-mono-vision SLAM in the future.





SLAM input options

Data to be processed for SLAM can be collected from a variety of devices.  The most common way is with stereoscopic cameras.  However Monovision SLAM has been getting more and more popular as of late.  Other mediums of SLAM include sonar, IR scanners, infrared range finders, even WIFI signals. In some cases, attempts have even been made to use multiple devices in order to compensate for the strengths and weaknesses of each other.  The issue with this is is compounds the complexity of writing an algorithm for such methods.  Allow me to go a bit more in depth about the benefits and weaknesses of input devices. First off ill knock off the weakest mediums from the list starting with sonar.  According to
U. Frese in his paper on SLAM "..sonar data is usually so bad, that it is very hard to derive any reliable landmark information or even identify a landmark from it.". IR scanners are cheap and good at detecting nearby obstacles, however have a short range and detecting obstacle shape.  Infrared rangefinders, while not having the problem with distance also have difficulty with shape recognition and also when scanning glass surfaces.  Overall vision sensors seem to be the best choice in most cases.G Zunino reinforces this opinion with a list of the benefits and downsides of visual sensors in his paper on SLAM.

•Large amount of information.
•Capability of getting 3D information about the environment.
•Cameras are passive sensors, they don’t have to emit sound or light pulses as sonar and laser sensors.
The drawbacks are:
•High computing requirement to extract the information from the images.
•Vision is highly influenced by the lighting.
•It is still expensive

Sunday, May 11, 2014

SLAM progression through time

SLAM is not so much the process of mapping out a structure as much as it is the correction of inconsistencies from a machines input devices(cameras, radar, infrared..etc).  We can autonomously map and navigate structures and have been able to for quite a few years.  The issue is that this was only possible in computer simulated environments with a set of wonderful, infallible computer generated input devices.  In such an environment, their is no need to look back over the information given.  The machine always knows it position, direction it is facing, and exact distance from every landmark in the room.  In the real world, input devices provide mostly estimates of distance and shapes, and due to this robots get lost.  This is were SLAM came in. As the robot inevitably becomes more and more lost, SLAM allows it to compare other pictures containing similar landmarks to were it is at the moment.  If it has recorded a dining room with 8 chairs in the past and while lost in a hallway catches a snapshot through a doorway of 4 of the 8 chairs in the same pattern as before and the rest obstructed from view, it can guess that it has come back to that room even though its current map doesn't support the location.  It will then resize and cut its map until things are a bit more accurate than they were in the past.  Now... in the past, every time an iteration of SLAM was run (basically every time it took a picture) it needed to recompute the location of every landmark it had seen up until then.  Now the machine can only deal with a few hundred of landmark computations and once while remaining at real time speeds.  In comparison, a typical environment contains millions of landmarks that must be recorded. This meant that there was no way to autonomously map a real world environment in real time. A decade ago this began to change.  One notable algorithm that attempted to fix this was M. Thrun's FASTSLAM algorithm that attempted to fix this.  Before the introduction of FASTSLAM it would take 100*100 calculations to deal with 100 landmarks.  FASTSLAM brought this number to ln(100)*100. That is 460 calculations compared to 10,000, a big win for SLAM.  Attatched is an example of SLAM being done in a simulated environment with very few landmarks. Pretty boring looking huh?  In contrast, here is a person running around with one lens camera using SLAM to process what is sees as it travels throughout a house.  Its amazing the amount of landmarks picked up throughout the process. Interesting enough, the green circles shown are landmarks that are recorded, while the red circles and possible landmarks being made note of. Up until now I have primarily gone over stereo vision SLAM.  Next time I will go over a few of the alternatives, including Mono-vision SLAM shown in the video.

Montemerlo, M., Thrun, S., Koller, D., & Wegbreit, B. (2002, July). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In AAAI/IAAI (pp. 593-598).

SLAM - Long range camera warping effect

A popular way to map out landmarks in SLAM is using two long range cameras to capture images  of a scene.  Now part of SLAM involves reconstructing the environment seen in these images and rebuilding it in a 3D environment containing objects.  When a long range camera takes pictures, the lenses shift to allow a farther distance to be seen.  Now there is an issue with this, namely the pictures captured are slightly warped by the lens by differing degrees depending on the distance. When attempts are made to reconstruct what was captured, severe fragmentation is observed in several parts of the model.  Zhou and Koltun introduce their attempt at a solution in their paper on SLAC stating " This compact parameterization enables extremely efficient simultaneous localization and calibration (SLAC). As a result, our approach can reconstruct real-world scenes while estimating and correcting for range distortion in real time".  This basically means they want their algorithm to allow an undistorted version of the image to be derived from the long range image given in order to properly map out an area without strange looking fragments appearing. Another thing they state in their abstract is what makes their approach different from others stating, "Our approach directly recovers a camera trajectory alongside the distortion model. This distinguishes it from reconstruction approaches that simply deform the input data without performing distortion estimation and camera localization [14]."  This means unlike typical means of correcting which warp the scene as soon as it comes in, their algorithm attempts to use the distortion data to better decide the location of the autonomous robot before correcting the image.  These robots cant simply store all of these pictures after they store them, due to memory and speed requirements.  These must take the data from a landmark, put it into a compact form, compare it to all their compacted data from their old landmarks to keep a map of an area as accurate as possible. Theoretically they could store all of this information, but in order to keep SLAM running in real time the memory would cost a fortune and in the end most of the information wasn't necessary. 

Contained on page 2 of Zhou and Koltuns' paper there are 3 pictures of a car being 3D reconstructed using SLAM data.  The first is being done without their SLAC algorithm running.  The second is being done in real time(online). The third is done with no time constraints(offline), resulting in a slightly better rendering than the second, yet both are much better than the first and have no major deformities.  The biggest thing no notice is the closeness of quality between the offline and the online versions.  This is what SLAM developers strive for, the ability to to things as quickly as possible with nearly the same accuracy as a system with no time and computing restraints.

Zhou, Q. Y., & Koltun, V. Simultaneous Localization and Calibration: Self-Calibration of Consumer Depth Cameras.
2014

[14] Q.-Y. Zhou, S. Miller, and V. Koltun. Elastic fragments for
dense scene reconstruction. In ICCV, 2013.

SLAM

Simultaneous Localization And Mapping is an important field of research going on in the fields of AI and robotics.  It involves an autonomous machine being able to both map and navigate a completely unknown environment.  What really makes this difficult is the word simultaneous in SLAM.  It means the machine must be able to collect data, known where it is on its map, and create the map simultaneously.  Now unbeknownst to most, sensor technology; cameras, infrared distance finders, sonar, all have a flaw.  They have a terrible habit of being inaccurate when you need them most.  Now this by itself is not an issue for a human, a few cm's here, a few inches there.  The true issue comes when a machine is using these readings to decide its location in its environment.  The issues compound with every new incorrect input and eventually, the machine may think it is on the other side of its map, but is in fact in the room in started in.  For this very reason there are whole fields of SLAM dedicated to dealing with later realizing and attempting to correct errors in the machines map.  The final issue is every time the machine finds a new landmark, it must run SLAM's corrective algorithm on every landmark  it has seen in the past, combined with its new landmark.  This results SLAM developers having great difficulty making algorithms that don't force their robot to stop and compute for several hours before making its next move. The ability to allow their machines to move and map and the same time is in definition what SLAM strives for. Now there are several things that will make the demands of SLAM easier to accomplish with time.  The first is sensor accuracy.  With every year, our camera qualities get better and we develop improved measuring devices.  At the moment, SLAM developers can actually search and map in real time in a computer simulated environment.  In a way, SLAM has very little use in such perfect world scenarios were sensor inputs are exact.  The other avenue that will make the SLAM algorithm easier is our constantly improving processors.  However, with processor speeds increase slowing down, we may have to wait until the introduction of quantum processing to making SLAM resource use less taxing. Once SLAM has evolved, it will allow for a multitude of situations of be solved using autonomous robots.  Unmanned structural assessing of dangerous structures and automated search and rescue  through dangerous environments are two of my favorite theoretical uses for SLAM.



Contained here. on page 10 of this pdf is a map of a search done by a machine without SLAM's path correction run,  Next to it is a map with SLAM used.




Montemerlo, M., Thrun, S., Koller, D., & Wegbreit, B. (2002, July). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In AAAI/IAAI (pp. 593-598).

R. Smith, M. Self, and P. Cheeseman. Estimating uncertain
spatial relationships in robotics. In
Autonomous Robot Vehicles, Springer,
1990.